+ Reply to Thread
Results 1 to 1 of 1

Thread: Regular Languages

  1. #1
    Super Moderator
    Join Date
    Aug 2008
    Posts
    32

    Post Regular Languages

    Regular Expressions:-

    Syntactic equations of the form defined in EBNF generate context-free languages. The term "contextfree" is due to Chomsky and stems from the fact that substitution of the symbol left of = by a sequence derived from the expression to the right of = is always permitted, regardless of the context in which the symbol is embedded within the sentence. It has turned out that this restriction to context freedom is quite acceptable for programming languages, and that it is even desirable. The subclass, known as regular languages, plays a significant role in the realm of programming languages. In essence, they are the context-free languages whose syntax contains no recursion except for the specification of repetition. Since in EBNF repetition is specified directly and without the use of recursion, the following, simple definition can be given:

    A language is regular, if its syntax can be expressed by a single EBNF expression. The requirement that a single equation suffices also implies that only terminal symbols occur in the expression. Such an expression is called a regular expression.

    Two brief examples of regular languages may suffice. The first defines identifiers as they are common in most languages; and the second defines integers in decimal notation. We use the nonterminal symbols letter and digit for the sake of simplicity. They can be eliminated by substitution, whereby a regular expression results for both identifier and integer.

    identifier = letter {letter | digit}.

    integer = digit {digit}.

    letter = "A" | "B" | ... | "Z".

    digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9".

    The reason for our interest in regular languages lies in the fact that programs for the recognition of regular sentences are particularly simple and efficient. By "recognition" we mean the determination of the structure of the sentence, and thereby naturally the determination of whether the sentence is well formed, that is, it belongs to the language. Sentence recognition is called syntax analysis.

    For the recognition of regular sentences a finite automaton, also called a state machine, is necessary and sufficient. In each step the state machine reads the next symbol and changes state. The resulting state is solely determined by the previous state and the symbol read. If the resulting state is unique, the state machine is deterministic, otherwise nondeterministic. If the state machine is formulated as a program, the state is represented by the current point of program execution.

    The analysing program can be derived directly from the defining syntax in EBNF. For each EBNF construct K there exists a translation rule which yields a program fragment Pr(K). The translation rules from EBNF to program text are shown below. Therein sym denotes a global variable always representing the symbol last read from the source text by a call to procedure next. Procedure error terminates program execution, signalling that the symbol sequence read so far does not belong to the language.

    K Pr(K)

    "x" IF sym = "x" THEN next ELSE error END

    (exp) Pr(exp)

    [exp] IF sym IN first(exp) THEN Pr(exp) END

    {exp} WHILE sym IN first(exp) DO Pr(exp) END

    fac0 fac1 ... facn Pr(fac0); Pr(fac1); ... Pr(facn

    term0 | term1 | ... | termn CASE sym OF

    first(term0): Pr(term0)

    | first(term1): Pr(term1)
    ...
    | first(termn): Pr(termn)

    END

    The set first(K) contains all symbols with which a sentence derived from construct K may start. It is the set of start symbols of K. For the two examples of identifiers and integers they are:

    first(integer) = digits = {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9"}

    first(identifier) = letters = {"A", "B", ... , "Z"}

    The application of these simple translations rules generating a parser from a given syntax is, however, subject to the syntax being deterministic. This precondition may be formulated more precisely as follows:

    K Cond(K)

    term0 | term1 The terms must not have any common start symbols.

    fac0 fac1 If fac0 contains the empty sequence, then the factors must not have any common start symbols.

    [exp] or {exp} The sets of start symbols of exp and of symbols that may follow K must be disjoint.

    These conditions are satisfied trivially in the examples of identifiers and integers, and therefore we obtain
    the following programs for their recognition:

    IF s IN letters THEN next ELSE error END ;

    WHILE sym IN letters + digits DO

    CASE sym OF

    "A" .. "Z": next

    | "0" .. "9": next

    END

    END

    IF sym IN digits THEN next ELSE error END ;

    WHILE sym IN digits DO next END

    Frequently, the program obtained by applying the translation rules can be simplified by eliminating conditions which are evidently established by preceding conditions. The conditions sym IN letters and sym IN digits are typically formulated as follows:

    ("A" <= sym) & (sym <= "Z") ("0" <= sym) & (sym <= "9")

    The significance of regular languages in connection with programming languages stems from the fact that the latter are typically defined in two stages. First, their syntax is defined in terms of a vocabulary of abstract terminal symbols. Second, these abstract symbols are defined in terms of sequences of concrete terminal symbols, such as ASCII characters. This second definition typically has a regular syntax. The separation into two stages offers the advantage that the definition of the abstract symbols, and thereby of the language, is independent of any concrete representation in terms of any particular character sets used by any particular equipment.

    This separation also has consequences on the structure of a compiler. The process of syntax analysis is based on a procedure to obtain the next symbol. This procedure in turn is based on the definition of symbols in terms of sequences of one or more characters. This latter procedure is called a scanner, and syntax analysis on this second, lower level, lexical analysis. The definition of symbols in terms of characters is typically given in terms of a regular language, and therefore the scanner is typically a state machine.

    We summarize the differences between the two levels as follows:

    Process Input element Algorithm Syntax

    Lexical analysis Character Scanner Regular

    Syntax analysis Symbol Parser Context free
    Last edited by priti; 08-19-2008 at 12:06 PM.

+ Reply to Thread

Similar Threads

  1. Replies: 0
    Last Post: 01-06-2010, 05:30 AM
  2. Replies: 0
    Last Post: 05-21-2009, 06:30 AM
  3. Replies: 0
    Last Post: 03-26-2009, 12:10 PM
  4. Handling Foreign Languages in HTML
    By attractive in forum Graphics
    Replies: 0
    Last Post: 11-19-2008, 08:01 AM
  5. Regular expressions
    By priti in forum Programming
    Replies: 0
    Last Post: 10-06-2008, 11:35 AM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

LinkBacks Enabled by vBSEO