String Algorithms

字符串的大洞,迟早得补

Notations

We say that patterm $P$ occurs with shift $s$ in text $T$ (or, equivalently, that pattern $P$ occurs beginning at position $s+1$ in text $T$) if $0\leq s\leq n - m$ and $T[s + 1, \cdots, s+m] = P[1, \cdots m]$.

We demote by $\Sigma^*$ the set of all finite-length strings formed using characters from the alphabet $\Sigma$. The zero-length empty string, denoted $\epsilon$, also belongs to $\Sigma^*$. The concatenation of two strings $x$ and $y$, denoted $xy$, has length $|x| + |y|$ and consists of the characters from $x$ followed by the characters from $y$.

We say that a string $w$ is a prefix of a string $x$, denoted $w\sqsubset x$, if $x=wy$ for some string $y\in\Sigma^*$.
Similarly, we say that a string $w$ is a suffix of a string $x$, denoted $w\sqsupset x$ if $x=yw$ for som string $y\in\Sigma^*$.

Finite Automata (弃坑,等复习完 SAM 来补)

Definition

A finite automaton $M$, is a 5-tuple $(Q, q_0, A, \Sigma, \delta)$, where

  • $Q$ is a finite set of states
  • $q_0\in Q$ is the start state
  • $A\subseteq Q$ is a distinguished set of accepting states
  • $\Sigma$ is a finite input alphabet
  • $\delta$ is a function from $Q\times\Sigma$ into $Q$, called the transition function of $M$

The finite automaton begins in state $q_0$ and reads the characters of its input string one at a time. If the automaton is in state $q$ and reads input character $a$, it moves (“makes a transition”) from state $q$ to state $\delta(q, a)$. Whenever its current state $q$ is a member of $A$, the machine $M$ has accepted the string read so far. An input that is not accepted is rejected.

A finite automaton $M$ includes a function $\phi$, called the final-state function, from $\Sigma^*$ to $Q$ such that $\phi(w)$ is the state $M$ ends up in after scanning the string $w$. We define the function $\phi$ recursively, using the transition function:

$$
\begin{aligned}
\phi(\epsilon) &= q_0,\\
\phi(wa) &= \delta(\phi(w), a)\quad\text{for}\thinspace w\in\Sigma^*, a\in\Sigma.
\end{aligned}
$$

String-matching automata

For a given oattern $P$, we construct a string-matching automaton in a preprocessing step brfore using it to search the text string.

We first define an auxiliary function $\sigma$, called the suffix function corresponding to $P$. The function $\sigma$ maps $\Sigma^*$ to $\lbrace 0, 1, \cdots, m\rbrace$ such that $\sigma(x)$ is the length of the longest prefix of $P$ that is also a suffix of $x$:

$$\sigma(x) = \max\lbrace k:P[1..k]\sqsupset x\rbrace$$

KMP

Here, we assume $1$-based indexes.

Given a pattern $P[1, \cdots, m]$, the prefix function for the pattern $P$ is the function $\pi:\lbrace1, 2, \cdots, m\rbrace\to\lbrace0, 1, \cdots, m - 1\rbrace$ such that

$$\pi(q) = \max\lbrace k: k < q \thinspace\text{and}\thinspace P[1\cdots k]\sqsupset P[1\cdots q]\rbrace$$

That is, $\pi(q)$ is the length of the longest prefix of $P$ that is a proper suffix of $P[1\cdots q]$.

Take LOJ 103 as an exmaple.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// t = ' ' + t, p = ' ' + p
void prefixFunction(std::string p) // match P against itself
{
memset(pi, 0, sizeof pi);
int m = p.length() - 1, k = 0;
for (int q = 2; q <= m; ++q)
{
while (k && p[k + 1] != p[q])
k = pi[k];
if (p[k + 1] == p[q])
++k;
pi[q] = k;
}
}

int KMP(std::string t, std::string p) // match T against P
{
int r = 0, n = t.length() - 1, m = p.length() - 1, q = 0; // q: number of character matched
prefixFunction(p);
for (int i = 1; i <= n; ++i)
{
while (q && p[q + 1] != t[i])
q = pi[q]; // next character does not match
if (p[q + 1] == t[i])
++q; // next character matches
if (q == m) // is all of P matched ?
{
q = pi[q]; // look for the next match
++r;
}
}
return r;
}

Manacher

Manacher

Z algorithm

Z algorithm


String Algorithms
https://chenz01.github.io/2023/01/08/String Algorithms/
作者
ChenZ01
发布于
2023年1月8日
许可协议