leetcode-28-find-the-index-of-the-first-occurrence-in-a-string

題目

Given two strings needle and haystack, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

題目連結

Example 1

Input: haystack = "sadbutsad", needle = "sad"
Output: 0
Explanation: "sad" occurs at index 0 and 6.
The first occurrence is at index 0, so we return 0.

Example 2

1
2
3

Input: haystack = "leetcode", needle = "leeto"
Output: -1
Explanation: "leeto" did not occur in "leetcode", so we return -1.

Constraints:

1 <= haystack.length, needle.length <= 104
haystack and needle consist of only lowercase English characters.

解釋題目

給你兩個字串 haystack 和 needle，請你回傳 needle 在 haystack 中第一次出現的起始索引（index）。如果 needle 沒有出現在 haystack 中，請回傳 -1。

思路 (1) 暴力解

needle 的字串比較短，每一次取 needle 的長度對照 haystack
假設 haystack 的長度是 n；needle 的長度是 m，總共需檢查 n - m + 1 次
特殊情況:
- 如果 needle 長度 (m) 是 0，直接 return 0
- 如果 needle 長度 (m) > haystack 長度 (n)，不可能包含，直接 return -1

程式碼 (2)

class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        n, m = len(haystack), len(needle)
        if m == 0:
            return 0
        if m > n:
            return -1
        for i in range(n - m + 1):
            if haystack[i : i + m] == needle:
                return i
        return -1

思路 (2) KMP 演算法

參考教學影片

何謂 LPS（Longest Prefix Suffix）
- 前綴（Prefix）：從開頭開始的連續字元，但不一定到結尾。abc 的前綴有：a, ab, abc
- 後綴（Suffix）：從結尾開始往前的連續字元，但不一定到開頭。abc 的後綴有：c, bc, abc
- Longest Prefix Suffix：
  - 找到「既是前綴又是後」的字串
  - 不能把整個字串自己算進去，也就是字串 a 本身的前後綴不能是自己。
- 舉例來說，字串 needle = ababab 的 lps 會是 [0, 0, 1, 2, 3, 4]
  - lps[0] 代表只取一個 a，沒有最長的前後綴，所以 lps[0] = 0
  - lps[1] 代表取了 ab，所以沒有最長的前後綴，所以 lps[1] = 0
  - lps[2] 代表取了 aba，最長的前後綴是 a，所以 lps[2] = 1
  - lps[3] 代表取了 abab，最長的前後綴是 ab，所以 lps[3] = 2
  - lps[4] 代表取了 ababa，最長的前後綴是 aba，所以 lps[3] = 3
  - lps[5] 代表取了 ababab，最長的前後綴是 abab，所以 lps[3] = 4
建立 LPS 陣列（Longest Prefix Suffix）
用雙指針的方式去想，一開始會有兩個指針 prevLPS 跟 i，。prevLPS 會從 0 開始，i 要從 1 開始，因為 lps[0] 一定會等於 0，字串本身不能是前綴或後綴。再來會有三個情況，以 needle = ababc 舉例
1. 匹配成功，也就是 needle[prevLPS] 會跟 needle[i] 比較，匹配成功的話，代表第 prevLPS ( 一開始是 0 ) 個值會跟第 i ( 一開始是1 ) 個值一樣，所以就有一樣的前後綴
  - 此時，lps[i] = prevLPS + 1
  - prevLPS 往下一個數字移動，prevLPS += 1
  - i 也往下一個數字移動，i += 1
2. 匹配不成功的話，考慮 prevLPS 值，如果 prevLPS = 0
  - 代表 prevLPs 指針不能再往回退，所以 lps[i] = 0
  - i 繼續往下一個數字移動
3. 匹配不成功的話，考慮 prevLPS 值，如果 prevLPS != 0
  - prevLPS 指針往回退，退到上一個匹配到的最長前後綴 ( 有可能是 0 )，也就是 lps[prevLPS-1]
  - 因為還是跟 i 指針比較，所以 i 不用變動。
用 KMP 演算法
- 其實跟暴力解一樣也是要透過視窗的方式，一個一個比對字串
- 會是用兩指針去比對，i 是指向 haystack；j 是指向 needle
- 跟暴力解的差異會在比對不成功時， j 指針可以不用退到原點重新比對，可以從前一個 lps 的值開始比對
- 因為 haystack >= needle，所以使用 while 迴圈，當 i 指針走完整個 haystack 就結束迴圈，比對會有三種情況
  1. 如果 haystack[i] == needle[j] 這個是最簡單的狀況，就代表匹配成功， i, j 各自往右走
  2. 如果 haystack[i] != needle[j] 就要看 j 指針的位置
    1. 如果 j 指針是 0，代表無路可退，已經退到原點，那就移動到下個視窗重新比對，所以 i = i + 1
    2. 如果 j 指針還沒走到 0，那 j 就先退到前一個 lps 的位置，也就是 lps[j - 1]

程式碼 (2)

class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        if not needle:
            return 0
        
        # Step 1: 建立 LPS 陣列
        def build_lps(needle):
            lps = [0] * len(needle)

            # 雙指針: 同向
            prevLPS, i = 0, 1
            while i < len(needle):
                if needle[i] == needle[prevLPS]:
                    lps[i] = prevLPS + 1 # 是 index 也是 prevLPS 值本身，prevLPS 本身就是指針
                    prevLPS += 1
                    i += 1
                else:
                    # prevLPS 是 0 所以指針不能再往回退，代表沒有一樣的前後綴
                    if prevLPS == 0:
                        lps[i] = 0
                        i += 1
                    else:
                        # prevLPS 不是 0，所以指針往前一個去找 prevLPS - 1
                        prevLPS = lps[prevLPS - 1]

            return lps

        lps = build_lps(needle)

        # Step 2: 開始匹配
        i = j = 0  # i 指向 haystack，j 指向 needle
        while i < len(haystack):
            if haystack[i] == needle[j]:
                i, j = i + 1, j + 1
            else:
                if j == 0:
                    i += 1 
                else:
                    j = lps[j - 1]

            if j== len(needle):
                return i -len(needle)
                    

        return -1  # 沒找到