# Generative Grammar and Four Types of Grammars: Parsing Techniques Notes (1)

## Generative Grammar

There are several views of defining a language. The computer science and formal linguistics perspective:

terms | definition |
---|---|

Language | a set of sentences |

Sentence | a sequence of symbols |

Alphabet | a set of all symbols |

The semantics - meaning - of a sentence is described by its tokens cooperating with its structure.

Grammar is the set of rules describing a language.
**Generative Grammar** is

- exact
- fixed-sized

### Language can be specified by infinite Bit-String

Sorted Alphabet: $\Sigma$ Language $\Sigma^{* }$ contains all combinations of symbols in $\Sigma$

The order of sentences in $\Sigma^* $ follows that:

*from length $l = 0$*

- Enumerate all combinations that of length $l$
- these combinations are listed according to alphabetical order (lexicographical sorting)
- increment $l$ and repeat from 1.

This will form an infinitely long sorted list. Every language composed by alphabet $\Sigma$ can be identified by choosing from $\Sigma^* $. If we encode this by binary representation, 0 → not including, 1 → including, we can create an infinitely long bit string that indicates every sentence the language contains.

For example: Language $L = 010010110…$

### Formal Grammar

Recipe of replacing symbols:

```
Name -> tom | dick | harry // Name may be replaced by tom, dick or harry
...
```

A grammar is a *4-tuple* $(V_N, V_T, R, S)$:

- $V_N$ non-terminals, $V_T$ terminals are finite sets of symbols
- $V_N \cap V_T = \varnothing$ terminals and non-terminals cannot have common symbols
- $R$ is the set of rules, a set that contains ordered pairs: ${(P, Q) \mid P\in (V_N\cup V_T)^+ \land Q\in (V_N \cup V_T)^* }$
- $S$ is the start symbol, $S\in V_N$

## Four types of Grammars

### Type 0: Phrase Structure Grammar (PS Grammar)

Most freedom. Follows *4-tuple* $(V_N, V_T, R, S)$, without further restriction.
Represented as `Directed Acyclic Graph`

: No cycle exists

### Type 1: Context-Sensitive Grammar (CS Grammar)

There are two equivalent definitions: `Monotonic`

and `Context-Sensitive`

.
Can be represented by a DAG, similar to PS Grammar.

#### Monotonic

For each rule, left-hand side has more or equal number of symbols to right-hand side.

#### Context-Sensitive

Every rule is context-sensitive.

- Left-hand side contains only one symbol to be changed in the right-hand side.

### Type 2: Context-Free Grammar (CF Grammar)

LHS could only contain one non-terminal symbol. (Thus not related to neighboring symbols). Represented by a tree, as branches of a node is not relevant to other nodes.

The generative power of CF Grammar comes from two operations:

- Concatenation
- Choice (choosing from one of the alternatives in the RHS)

`NT -> tom | NT dick | ...`

### Type 3: Regular Grammar

Mostly referring to `right regular grammar`

.
Each rule could only contain one non-terminal, as the rightmost item.
Represented by a list, because each sentential has only one replaceable item (non-terminal), or in a production chain.

All regular grammar can be expressed in a regular expression, which sufficiently equal to all rules in the grammar.

Regular Expression Notation Styles:

Notation | Description |
---|---|

$^{+ }$ | One or more instances of the left-adjacent item |

$^{* }$ | Zero or more instances of the left-adjacent item |

$^{? }$ | Zero or one instance of the left-adjacent item |

$[abc]$ | Choosing one from $(a, b, c)$, i.e., $(a\mid b\mid c)$ |

Example:

$S_S\to(([tdh],)^{* }[tdh] & )^{? }[tdh]$

### Type 4: Finite Choice Grammar (FC Grammar)

Each rule could have only terminals in right-hand side. Very limited expressive power.