String Algorithms
字符串的大洞,迟早得补
Notations
We say that patterm $P$ occurs with shift $s$ in text $T$ (or, equivalently, that pattern $P$ occurs beginning at position $s+1$ in text $T$) if $0\leq s\leq n - m$ and $T[s + 1, \cdots, s+m] = P[1, \cdots m]$.
We demote by $\Sigma^*$ the set of all finite-length strings formed using characters from the alphabet $\Sigma$. The zero-length empty string, denoted $\epsilon$, also belongs to $\Sigma^*$. The concatenation of two strings $x$ and $y$, denoted $xy$, has length $|x| + |y|$ and consists of the characters from $x$ followed by the characters from $y$.
We say that a string $w$ is a prefix of a string $x$, denoted $w\sqsubset x$, if $x=wy$ for some string $y\in\Sigma^*$.
Similarly, we say that a string $w$ is a suffix of a string $x$, denoted $w\sqsupset x$ if $x=yw$ for som string $y\in\Sigma^*$.
Finite Automata (弃坑,等复习完 SAM 来补)
Definition
A finite automaton $M$, is a 5-tuple $(Q, q_0, A, \Sigma, \delta)$, where
- $Q$ is a finite set of states
- $q_0\in Q$ is the start state
- $A\subseteq Q$ is a distinguished set of accepting states
- $\Sigma$ is a finite input alphabet
- $\delta$ is a function from $Q\times\Sigma$ into $Q$, called the transition function of $M$
The finite automaton begins in state $q_0$ and reads the characters of its input string one at a time. If the automaton is in state $q$ and reads input character $a$, it moves (“makes a transition”) from state $q$ to state $\delta(q, a)$. Whenever its current state $q$ is a member of $A$, the machine $M$ has accepted the string read so far. An input that is not accepted is rejected.
A finite automaton $M$ includes a function $\phi$, called the final-state function, from $\Sigma^*$ to $Q$ such that $\phi(w)$ is the state $M$ ends up in after scanning the string $w$. We define the function $\phi$ recursively, using the transition function:
$$
\begin{aligned}
\phi(\epsilon) &= q_0,\\
\phi(wa) &= \delta(\phi(w), a)\quad\text{for}\thinspace w\in\Sigma^*, a\in\Sigma.
\end{aligned}
$$
String-matching automata
For a given oattern $P$, we construct a string-matching automaton in a preprocessing step brfore using it to search the text string.
We first define an auxiliary function $\sigma$, called the suffix function corresponding to $P$. The function $\sigma$ maps $\Sigma^*$ to $\lbrace 0, 1, \cdots, m\rbrace$ such that $\sigma(x)$ is the length of the longest prefix of $P$ that is also a suffix of $x$:
$$\sigma(x) = \max\lbrace k:P[1..k]\sqsupset x\rbrace$$
KMP
Here, we assume $1$-based indexes.
Given a pattern $P[1, \cdots, m]$, the prefix function for the pattern $P$ is the function $\pi:\lbrace1, 2, \cdots, m\rbrace\to\lbrace0, 1, \cdots, m - 1\rbrace$ such that
$$\pi(q) = \max\lbrace k: k < q \thinspace\text{and}\thinspace P[1\cdots k]\sqsupset P[1\cdots q]\rbrace$$
That is, $\pi(q)$ is the length of the longest prefix of $P$ that is a proper suffix of $P[1\cdots q]$.
Take LOJ 103 as an exmaple.
1 |
|