[LeetCode] 28. Implement strstr() (KMP super detailed explanation, sunday solution and other five methods, java implementation)

[LeetCode] 28. Implement strstr() (KMP super detailed explanation, sunday solution and other five methods, java implementation)

topic

link

analysis

Overview

This question is to find the needle string in the haystack string. 3.solutions are given below, all of which are based on sliding windows.

The simplest solution is to compare substrings one by one. The sliding window of length L is gradually moved along the haystack string, and the substring in the window is compared with the needle string. The time complexity is O((N-L)L )

Show the above method can be optimized. Although the double pointer method is also linear time complexity, it can avoid comparing all substrings. Therefore, the time complexity in the optimal case is O(N), but the time complexity in the worst case is still O((N -L)L).

Is there a solution with O(N) complexity? The answer is yes, there are two ways to achieve it:

  • Rabin-Karp, through a hash algorithm to achieve string comparison within a constant time window.
  • Bit operation, through bit mask to achieve string comparison within a constant time window.

Method 1: Compare the substrings one by one-linear time complexity

The most straightforward method-move the sliding window step by step along the character change, and compare the substring in the window with the needle string.

achieve

class Solution { public int strStr (String haystack, String needle) { int L = needle.length(), n = haystack.length(); for ( int start = 0 ; start <n-L + 1 ; ++start) { if (haystack.substring(start, start + L).equals(needle)) { return start; } } return - 1 ; } } Copy code

Complexity analysis

  • Time complexity: O((N-L)L), where N is the length of the haystack string and L is the length of the needle string. The complexity of comparing strings in the inner loop is L, and a total of (N-L) comparisons are required.
  • Space complexity: O(1).

Method 2: Double pointer-linear time complexity

The flaw of the previous method is that all substrings of length L of haystack are compared with the needle string, which is actually unnecessary.

1. the comparison is only required when the first character of the substring is the same as the first character of the needle string.

Secondly, it can be compared character by character, and once it does not match, it will terminate immediately.

As shown in the figure below, a mismatch is found when the last bit is compared, and the backtracking starts at this time. Note that, the pointer is moved to the PN pn = pn - curr_len + 1 position, rather than pn = pn - curr_len position.

At this time, we compare again and find a complete matching substring, and return the starting position pn-L of the substring directly.

algorithm

  • Move the pn pointer until the character at the position pointed to by pn is equal to the first character of the needle string.
  • Calculate the matching length through pn, pL, and curr_len.
  • If it matches exactly (ie curr_len == L), return the starting coordinates of the matched substring (ie pn-L).
  • If it doesn't match exactly, backtrack. Let pn = pn-curr_len + 1, pL = 0, and curr_len = 0.

achieve

class Solution { public int strStr (String haystack, String needle) { int L = needle.length(), n = haystack.length(); if (L == 0 ) return 0 ; int pn = 0 ; while (pn <n-L + 1 ) { //find the position of the first needle character //in the haystack string while (pn <n-L + 1 && haystack.charAt(pn) != needle.charAt( 0 )) ++pn; //compute the max match string int currLen = 0 , pL = 0 ; while (pL <L && pn <n && haystack.charAt(pn) == needle.charAt(pL)) { ++pn; ++pL; ++currLen; } //if the whole needle string is found, //return its start position if (currLen == L) return pn-L; //otherwise, backtrack pn = pn-currLen + 1 ; } return - 1 ; } } Copy code

Complexity analysis

  • Time complexity: The worst time complexity is O((N-L)L), and the optimal time complexity is O(N).
  • Space complexity: O(1).

Method 3: Rabin Karp-constant complexity

There is an algorithm whose worst time complexity is also O(N). The idea is this, first generate the hash code of the substring in the window, and then compare it with the hash code of the needle string.

This idea has a problem to be solved, how to generate the hash code of the substring in constant time?

Rolling hash: constant time to generate hash code

It takes O(L) time to generate a hash code with a length of L array.

How to generate the hash code of the sliding window array in constant time? Using the characteristics of the sliding window, one element enters and one comes out every time you slide.

Since only lowercase English letters appear, the string can be converted into an array of integers with values from 0 to 25:

arr[i] = (int)S.charAt(i)-(int)'a'
. According to this rule,
abcd
The integer array form is
[0, 1, 2, 3]
, The conversion formula is shown below.

h 0 = 0 2 6 3 + 1 2 6 2 + 2 2 6 1 + 3 2 6 0 h_0 = 0/times 26^3 + 1/times 26^2 + 2/times 26^1 + 3/times 26^0 h0 =0 263+1 262+2 261+3 260

The above formula can be written as a general formula, as shown below. Where c_i c**i is an element in an integer array, a = 26 a = 26, which is the number of character sets.

h 0 = c 0 a L 1 + c 1 a L 2 +... + cia L 1 i +... + c L 1 a 1 + c L a 0 0 h_0 = c_0 a^{ L-1} + c_1 a^{L-2} + ... + c_i a^{L-1-i} + ... + c_{L-1} a^1 + c_L a^00 h0 = c0 aL 1+c1 aL 2+...+ci aL 1 i+...+cL 1 a1+cL a00

h 0 = i = 0 L 1 cia L 1 i h_0 =/sum_{i = 0}^{L-1}{c_i a^{L-1-i}} h0 = i=0L 1 ci aL 1 i

Let s consider the window from

abcd
Slide to
bcde
Case. At this time, the integer array is from
[0, 1, 2, 3]
became
[1, 2, 3, 4]
, The leftmost 0 of the array is removed, and the rightmost 4 is newly added. The hash value of the array after sliding can be calculated according to the hash value of the array before sliding. The calculation formula is as follows.

h 1 = (h 0 0 2 6 3) 26 + 4 2 6 0 h_1 = (h_0-0/times 26^3)/times 26 + 4/times 26^0 h1 =(h0 0 263) 26+4 260

Written as a general formula as shown below.

h 1 = (h 0 a c 0 a L) + c L + 1 h_1 = (h_0 a-c_0 a^L) + c_(L + 1) h1 =(h0 a c0 aL)+cL +1

How to avoid overflow

a^L a**L may be a very large number, so you need to set an upper limit to avoid overflow. The upper limit of the value can be set by modulo, namely

h% modulus
To replace the original hash value.

Theoretically, modules should be a large number, but what should be the specific number? See this article for details . For this problem, $ 2^{31} $ is enough.

algorithm

  • Calculate substring

    haystack.substring(0, L)
    with
    needle.substring(0, L)
    The hash value.

  • Traverse from the starting position: traverse from the first character to the N-Lth character.

    • Calculate the rolling hash based on the previous hash value.
    • If the hash value of the substring is equal to the hash value of the needle string, the starting position of the sliding window is returned.
  • -1 is returned, and the needle string does not exist in the haystack string at this time.

achieve

class Solution { //function to convert character to integer public int charToInt ( int idx, String s) { return ( int )s.charAt(idx)-( int ) 'a' ; } public int strStr (String haystack, String needle) { int L = needle.length(), n = haystack.length(); if (L> n) return - 1 ; //base value for the rolling hash function int a = 26 ; //modulus value for the rolling hash function to avoid overflow long modulus = ( long )Math.pow( 2 , 31 ); //compute the hash of strings haystack[:L], needle[:L] long h = 0 , ref_h = 0 ; for ( int i = 0 ; i <L; ++i) { h = (h * a + charToInt(i, haystack))% modulus; ref_h = (ref_h * a + charToInt(i, needle))% modulus; } if (h == ref_h) return 0 ; //const value to be used often: a**L% modulus long aL = 1 ; for ( int i = 1 ; i <= L; ++i) aL = (aL * a)% modulus; for ( int start = 1 ; start <n-L + 1 ; ++start) { //compute rolling hash in O(1) time h = (h * a-charToInt(start- 1 , haystack) * aL + charToInt(start + L- 1 , haystack))% modulus; if (h == ref_h) return start; } return - 1 ; } } Copy code

Complexity analysis

  • Time complexity: O(N), it takes O(L) O ( L ) time to calculate the hash value of the needle string , and then it needs to be executed (N-L)( N L ) cycles , and the calculation of each cycle is complicated Degree is constant.
  • Space complexity: O(1).

Method 4: Sunday algorithm

1. Sunday matching mechanism

The matching mechanism is very easy to understand:

  • Target string
    String
  • Pattern string
    Pattern
  • Current query index
    idx
    (Initially 0)
  • String to be matched
    str_cut
    :
    String [idx: idx + len(Pattern)]

Are each hit from the target string extracted to be matched string with the pattern string matching:

  • If it matches, return the current

    idx

  • Do not match, check to be matched string after a character

    c
    :

    1. If
      c
      Exist in
      Pattern
      In, then
      idx = idx + offset table [c]
    2. otherwise,
      idx = idx + len(pattern)

Repeat Loop until

idx + len(pattern)> len(String)


2. the offset table

Using the offset table is stored in each of the pattern string of characters that appear in the pattern string rightmost position +1 appears to distance the tail, e.g. aab:

  • The offset of a is
    len(pattern)-1 = 2
  • The offset of b is
    len(pattern)-2 = 1
  • The others are
    len(pattern)+1 = 4

To summarize:


3. for example

String:

checkthisout

Pattern:
this

Step 1:

  • idx = 0
  • The string to be matched is:
    chec
  • because
    chec != this
  • So check
    chec
    Next character
    k
  • k
    Not in pattern
  • So look at the offset table,
    idx = idx + 5

Step 2:

  • idx = 5
  • The string to be matched is:
    this
  • because
    this == this
  • Matches, so return 5

4. algorithm analysis

Worst case: O(nm)
Average case: O(n)


5. the code

class Solution : def strStr ( self , haystack : str , needle : str ) -> int : # Func : Calculate the offset table def calShiftMat ( st ): dic = () for i in range (len(st) -1,-1,-1): if not dic. get (st[i]) : dic[st[i]] = len(st)-i dic[ "ot" ] = len(st)+ 1 return dic # Other situation judgment if len (needle) > len (haystack) :return -1 if needle == "" : return 0 # Offset table preprocessing dic = calShiftMat(needle) idx = 0 while idx+len(needle) <= len(haystack): # String to be matched str_cut = haystack[idx:idx+len(needle)] # Determine if it matches if str_cut==needle: return idx else : # Boundary processing if idx+len(needle) >= len(haystack): return - 1 # In case of mismatch, move idx according to the offset of the next character cur_c = haystack[idx+len(needle)] if dic.get(cur_c): idx += dic[cur_c] else : idx += dic[ "ot" ] return - . 1 IF IDX + len (Needle)> = len (of haystack) the else IDX duplicated code

Method five: KMP algorithm

The KMP algorithm (Knuth-Morris-Pratt algorithm) is a well-known string matching algorithm, which is very efficient, but it is indeed a bit complicated.

Many readers complain that the KMP algorithm cannot be understood. This is normal. Thinking about the explanation of the KMP algorithm in university textbooks, I don't know how many future Knuth, Morris, and Pratt have been dismissed in advance. There are some excellent students who help understand the algorithm by pushing the KMP algorithm by hand. This is one way, but this article will help readers understand the principle of the algorithm from a logical level. Between ten lines of code, KMP was wiped out.

First agreed at the beginning, this article uses

pat
Represents the pattern string, the length is
M
,
TXT
Represents a text string, the length is
N
. The KMP algorithm is
TXT
Find substring
pat
If it exists, return the starting index of this substring, otherwise return -1
.

Why I think the KMP algorithm is a dynamic programming problem, I will explain it later. For dynamic programming, we have repeatedly emphasized the need to be clear

dp
The meaning of an array, and the same problem may have more than one definition
dp
The method of the meaning of the array, different definitions will have different solutions.

The KMP algorithm that readers have seen should be a wave of weird operations

pat
After forming a one-dimensional array
next
, And then go through another wave of complex operations to match according to this array
TXT
. The time complexity is O(N), and the space complexity is O(M). Actually this
next
The array is equivalent to
dp
Array, where the meaning of the elements follows
pat
The prefix is related to the suffix, and the judgment rules are more complicated and difficult to understand. This article uses a two-dimensional
dp
Array (but the space complexity is still O(M)), redefining the meaning of the elements in it, greatly reducing the code length and greatly improving the interpretability
.

PS: The code in this article refers to "Algorithm 4", the name of the array used by the original code is

dfa
(Determine the finite state machine), I made a little modification to the code in the book, and continue to use
dp
The name of the array.

This article will use dynamic programming algorithm design techniques (inductive ideas), so I hope readers have read this article "The Longest Increasing Subsequence of Dynamic Programming Design", it is easy to understand.

1. Overview of KMP algorithm

First of all, let's briefly introduce the difference between the KMP algorithm and the brute force matching algorithm, where are the difficulties, and what is the relationship with dynamic programming.

The violent string matching algorithm is easy to write. Take a look at its operating logic:

//Brute force matching (pseudocode) int search (String pat, String txt) { int M = pat.length; int N = txt.length; for ( int i = 0 ; i <N-M; i++) { int j ; for (j = 0 ; j <M; j++) { if (pat[j] != txt[i+j]) break ; } //pat all match if (j == M) return i; } //pat substring does not exist in txt return - 1 ; } Copy code

For brute force algorithms, if there are unmatched characters, fall back at the same time

TXT
with
pat
The pointer, nested for loop, time complexity O(MN), space complexity O(1). The main problem is that if there are many repeated characters in the string, the algorithm appears stupid.

For example, txt = "aaacaaab" pat = "aaab":

It is clear,

pat
There is no character c in it, and there is no need to roll back the pointer
i
, The violent solution obviously does a lot of unnecessary operations.

The difference of the KMP algorithm is that it will spend space to record some information, which will be very smart in the above situation:

Another example is the similar txt = "aaaaaaab" pat = "aaab", the violent solution will fall back the pointer as stupidly as the above example

i
, And the KMP algorithm will be smart:

Because the KMP algorithm knows that the character a before the character b is matched, it only needs to compare whether the character b is matched each time.

KMP algorithm never rolls back

TXT
Pointer
i
, Do not go back (do not repeat scanning
TXT
), but with the help of
dp
The information stored in the array is
pat
Moving to the correct position to continue matching
, the time complexity is only O(N), and space is exchanged for time, so I think it is a dynamic programming algorithm.

The difficulty of the KMP algorithm is how to calculate

dp
The information in the array? How to move correctly based on this information
pat
Pointers? This needs to be assisted by determining the finite state automata . Don t be afraid of this kind of tall literary vocabulary.
dp
Arrays are exactly the same, and you can use this word to scare others when you learn it.

One more thing to be clear is: calculate this

dp
Array, and only
pat
String related
. Means, just give me a
pat
, I can calculate from this pattern string
dp
Array, then you can give me different
TXT
, I'm not afraid, use this
dp
Array I can complete string matching in O(N) time.

Specifically, such as the two examples cited above:

= the txt1 "aaacaaab" PAT = "AAAB" txt2 = "aaaaaaab" PAT = "AAAB" copy the code

our

TXT
Different but
pat
Is the same, so the KMP algorithm uses
dp
The array is the same.

Just for

txt1
The following impending mismatch of:

dp
Array indicator
pat
Move like this:

PS: this

j
Don't understand it as an index, its meaning should be more accurately state (state), so it will appear in this strange position, which will be detailed later.

And for

txt2
The following impending mismatch of:

dp
Array indicator
pat
Move like this:

understood

dp
Array only sum
pat
Related, then we design the KMP algorithm like this will be more beautiful:

public class KMP { private int [][] dp; private String pat; public KMP (String pat) { this .pat = pat; //Constructing a dp array through pat //It takes O(M) time } public int search (String txt) { //Use dp array to match txt //It takes O(N) time } } Copy code

In this way, when we need to use the same

pat
To match different
TXT
Time, you don t need to waste time constructing
dp
Array up:

KMP = the KMP new new the KMP ( "AAAB" ); int POS1 = kmp.search ( "aaacaaab" ); //. 4 int POS2 = kmp.search ( "aaaaaaab" ); //. 4 copy the code

2. state machine overview

Why is the KMP algorithm related to the state machine? That's it, we can think

pat
The match is the transition of state. For example, when pat = "ABABC":

As shown in the figure above, the number in the circle is the state, state 0 is the starting state, and state 5 (

pat.length
) Is the termination state. When starting to match
pat
In the initial state, once it transitions to the end state, it means that the
TXT
Found in
pat
. For example, if it is currently in state 2, it means that the character "AB" is matched:

In addition, when in a different state,

pat
The behavior of state transition is also different. For example, suppose it is now matched to state 4, if it encounters character A, it should transition to state 3, it should transition to state 5 when it encounters character C, and it should transition to state 0 if it encounters character B:

What do you mean specifically, let's take a look at one by one. Use variables

j
Represents a pointer to the current state, the current
pat
Matched to state 4:

If the character "A" is encountered, it is the smartest to move to state 3 according to the arrow indication:

If you encounter the character "B", according to the arrow, you can only transition to state 0 (return to before liberation overnight):

If the character "C" is encountered, according to the arrow, it should transition to the end state 5, which means that the match is complete:

Of course, other characters may be encountered, such as Z, but obviously it should be transferred to the starting state 0, because

pat
There is no character Z at all in:

For the sake of clarity here, when we draw the state diagram, we omit the arrow that transfers other characters to state 0, and only draw

pat
State transition of characters appearing in:

The most critical step of the KMP algorithm is to construct this state transition diagram. To determine the behavior of state transition, two variables must be clarified, one is the current matching state, and the other is the character encountered ; after determining these two variables, you can know which state should be transitioned to in this case.

Let's take a look at the KMP algorithm matching strings according to this state transition diagram

TXT
the process of:

Please remember this GIF matching process, this is the core logic of the KMP algorithm !

In order to describe the state transition diagram, we define a two-dimensional dp array, its meaning is as follows:

dp[j][c] = next 0 <= j <M, representing the current state 0 <= c < 256 , representing the characters encountered (ASCII code) 0 <= next <= M, representing the next state dp[ 4 ][ 'A' ] = 3 means: The current state is 4 , if the character A is encountered, pat should transition to state 3 dp[ 1 ][ 'B' ] = 2 means: The current state is 1 , if the character B is encountered, pat should transition to state 2 duplicated code

According to the definition of our dp array and the state transition process just now, we can first write the search function code of the KMP algorithm:

public int search (String txt) { int M = pat.length(); int N = txt.length(); //the initial state of pat is 0 int j = 0 ; for ( int i = 0 ; i <N; i++) { //The current state is j, and the character txt[i] is encountered, //which state should pat transition to? j = dp[j][txt.charAt(i)]; //If the end state is reached, return the index at the beginning of the match if (j == M) return i-M + 1 ; } //Did not reach the termination state, the match failed return - 1 ; } Copy code

Up to this point, it should be easy to understand.

dp
The array is the state transition diagram we just drew. If it is not clear, go back and look at the evolution of the GIF algorithm. Explained below: how to pass
pat
Build this
dp
Array?

3. build a state transition diagram

Recall what I just said: To determine the behavior of state transition, two variables must be clarified, one is the current matching state, and the other is the character encountered , and we have determined it according to this logic.

dp
The meaning of the array, then construct
dp
The frame of the array is like this:

for 0 <= j <M: # state for 0 <= c < 256 : # character dp[j][c] = next Copy code

How should I ask for this next status? Obviously, if the characters encountered

c
with
pat[j]
If there is a match
, the state should advance by one, that is to say
next = j + 1
, We might as well call this situation as state advancement :

If character

c
with
pat[j]
If there is no match
, the state will be rolled back (or stay in place). We might as well call this situation a state restart :

So, how do you know in which state to restart? Before answering this question, let's define another name: shadow state (the name I made up), using variables

X
Said. The so-called shadow state means the same prefix as the current state . For example, the following situation:

Current state

j = 4
, Its shadow state is
X = 2
, They all have the same prefix "AB". Because of the state
X
And status
j
The same prefix exists, so when the state
j
When preparing to restart the state (characters encountered
c
with
pat[j]
Does not match), you can pass
X
State transition diagram to get the most recent restart position .

For example, in the situation just now, if the status

j
Where should I transfer to a character "A"? 1. the state can only be advanced when encountering "C", and when encountering "A", the state can only be restarted. status
j
Will delegate this character to the state
X
Processing, that is
dp[j]['A'] = dp[X]['A']
:

Why is this all right? Because: since

j
It has been determined that the character "A" cannot be advanced and can only be rolled back , and KMP is to roll back as little as possible to avoid redundant calculations. Then
j
You can ask if you have the same prefix as yourself
X
,in case
X
When you meet "A", you can proceed with "state advancement", then move to the past, because this way the regression is the least.

Of course, if the character encountered is "B", the state

X
You cannot perform "state advancement", you can only roll back,
j
Just follow
X
Just go back in the direction of the guide:

You may ask, this

X
How do you know that you want to fall back to state 0 when you encounter the character "B"? because
X
Always follow
j
Behind the state
X
How to transfer has been calculated before. Isn't the dynamic programming algorithm just using past results to solve current problems?

In this way, we will refine the framework code just now:

int X # Shadow state for 0 <= j <M: for 0 <= c < 256 : if c == pat[j]: # State advancement dp[j][c] = j + 1 else : # State restart # Commission X to calculate the restart position dp[j][c] = dp[X][c] Copy code

4. code implementation

If you can understand the previous content, congratulations, now there is one question left: shadow state

X
How did you get it? Let's look directly at the complete code below.

public class KMP { private int [][] dp; private String pat; public KMP (String pat) { this .pat = pat; int M = pat.length(); //dp[state][char] = next state dp = new int [M][ 256 ]; //base case dp[ 0 ][pat.charAt( 0 )] = 1 ; //The shadow state X is initially 0 int X = 0 ; //The current state j starts from 1 for ( int j = 1 ; j <M; j++) { for ( int c = 0 ; c < 256; c++) { if (pat.charAt(j) == c) dp[j][c] = j + 1 ; else dp[j][c] = dp[X][c]; } //update shadow status X = dp[X][pat.charAt(j)]; } } public int search (String txt) {...} } Copy code

First explain this line of code:

//base case dp[ 0 ][pat.charAt( 0 )] = 1 ; copy the code

This line of code is the base case. Only when the character pat[0] is encountered can the state transition from 0 to 1. If other characters are encountered, the state will remain at state 0 (the Java default initialization array is all 0).

Shadow state

X
Is first initialized to 0, and then follows
j
The progress and continuous update. Let s take a look at how to update the shadow status
X
:

int X = 0 ; for ( int j = 1 ; j <M; j++) { ... //Update the shadow state //The current state is X, and the character pat[j] is encountered, //Which state should pat transition to? X = dp[X][pat.charAt(j)]; } Copy code

Update

X
Actually and
search
Update status in function
j
The process is very similar:

int j = 0 ; for ( int i = 0 ; i <N; i++) { //The current state is j, and the character txt[i] is encountered, //which state should pat transition to? j = dp[j][txt.charAt(i)]; ... } Copy code

The principle is very subtle . Pay attention to the initial value of the variable in the for loop in the code. It can be understood like this: the latter is in

TXT
Medium match
pat
, The former is in
pat
Medium match
pat[1..end]
,status
X
Always behind
j
A state with
j
Have the longest identical prefix. So i put
X
The analogy to the shadow state seems to be a little bit apt.

In addition, the construction of the dp array is based on the base case

dp[0][..]
Deduction backwards. This is why I think the KMP algorithm is a dynamic programming algorithm.

Let's take a look at the complete construction process of the state transition diagram, you can understand the state

X
The effect is subtle:

At this point, the core of the KMP algorithm is finally finished! Take a look at the complete code of the KMP algorithm:

public class KMP { private int [][] dp; private String pat; public KMP (String pat) { this .pat = pat; int M = pat.length(); //dp[state][char] = next state dp = new int [M][ 256 ]; //base case dp[ 0 ][pat.charAt( 0 )] = 1 ; //The shadow state X is initially 0 int X = 0 ; //Construct a state transition graph (slightly modified and more compact) for ( int j = 1 ; j <M; j++) { for ( int c = 0 ; c <256 ; c++) { dp[j][c] = dp[X][c]; dp[j][pat.charAt(j)] = j + 1 ; //update the shadow state X = dp[X][pat.charAt(j)]; } } public int search (String txt) { int M = pat.length(); int N = txt.length(); //the initial state of pat is 0 int j = 0 ; for ( int i = 0 ; i <N; i++) { //Calculate the next state of pat j = dp[j][txt.charAt(i)]; //Reach the end state, return the result if (j == M) return i-M + 1 ; } //Did not reach the termination state, the match failed return - 1 ; } } Copy code

After the previous detailed examples, you should be able to understand the meaning of this code. Of course, you can also write the KMP algorithm as a function. The core code is the part of the for loop in the two functions. Is there more than ten lines in the count?

5. the final summary

The traditional KMP algorithm uses a one-dimensional array

next
Record the prefix information, and this article uses a two-dimensional array
dp
The character matching problem is solved from the perspective of state transition, but the space complexity is still $ O(256M) = O(M)$.

in

pat
match
TXT
In the process, as long as you clarify the two questions of "Which state you are currently in" and "What is the character encountered", you can determine which state (advance or retreat) should be transferred to.

For a pattern string

pat
There are a total of M states, and for ASCII characters, there are no more than 256 states. So we construct an array
dp[M][256]
To cover all situations and make it clear
dp
The meaning of the array:

dp[j][c] = next
Indicates that the current state is
j
, Characters encountered
c
, Should be transferred to the state
next
.

After clarifying its meaning, you can easily write the code of the search function.

For how to build this

dp
Array, requires an auxiliary state
X
, It is always better than the current state
j
One state behind, owning and
j
The longest identical prefix, we named it "Shadow State".

In the current state of the build

j
When the transfer direction, only characters
pat[j]
In order to advance the state (
dp[j][pat[j]] = j+1
); For other characters, only the state can be rolled back, so you should consult the shadow state
X
Where should I fall back to (
dp[j][other] = dp[X][other]
,among them
other
Except
pat[j]
All characters except).

For the shadow state

X
, We initialize it to 0, and follow
j
Update, update method and search process update
j
The process is very similar (
X = dp[X][pat[j]]
).

The KMP algorithm is also a matter of dynamic programming, and it is based on a set of frameworks. It is nothing more than describing the logic of the problem and making it clear

dp
The meaning of the array, the definition of base case is a bit of a problem. I hope this article will give you a deeper understanding of dynamic programming.

.

in

pat
match
TXT
In the process, as long as you clarify the two questions of "Which state you are currently in" and "What is the character encountered", you can determine which state (advance or retreat) should be transferred to.

For a pattern string

pat
There are a total of M states, and for ASCII characters, there are no more than 256 states. So we construct an array
dp[M][256]
To cover all situations and make it clear
dp
The meaning of the array:

dp[j][c] = next
Indicates that the current state is
j
, Characters encountered
c
, Should be transferred to the state
next
.

After clarifying its meaning, you can easily write the code of the search function.

For how to build this

dp
Array, requires an auxiliary state
X
, It is always better than the current state
j
One state behind, owning and
j
The longest identical prefix, we named it "Shadow State".

In the current state of the build

j
When the transfer direction, only characters
pat[j]
In order to advance the state (
dp[j][pat[j]] = j+1
); For other characters, only the state can be rolled back, so you should consult the shadow state
X
Where should I fall back to (
dp[j][other] = dp[X][other]
,among them
other
Except
pat[j]
All characters except).

For the shadow state

X
, We initialize it to 0, and follow
j
Update, update method and search process update
j
The process is very similar (
X = dp[X][pat[j]]
).

The KMP algorithm is also a matter of dynamic programming, and it is based on a set of frameworks. It is nothing more than describing the logic of the problem and making it clear

dp
The meaning of the array, the definition of base case is a bit of a problem. I hope this article will give you a deeper understanding of dynamic programming.