Module pyparsing
[frames | no frames]

Module pyparsing

pyparsing module - Classes and methods to define and execute parsing grammars

The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the traditional lex/yacc approach, or the use of regular expressions. With pyparsing, you don't need to learn a new syntax for defining grammars or matching expressions - the parsing module provides a library of classes that you use to construct the grammar directly in Python.

Here is a program to parse "Hello, World!" (or any greeting of the form "<salutation>, <addressee>!"):
   from pyparsing import Word, alphas
   
   # define grammar of a greeting
   greet = Word( alphas ) + "," + Word( alphas ) + "!" 
   
   hello = "Hello, World!"
   print hello, "->", greet.parseString( hello )
The program outputs the following:
   Hello, World! -> ['Hello', ',', 'World', '!']

The Python representation of the grammar is quite readable, owing to the self-explanatory class names, and the use of '+', '|' and '^' operators.

The parsed results returned from parseString() can be accessed as a nested list, a dictionary, or an object with named attributes.

The pyparsing module handles some of the problems that are typically vexing when writing text parsers:
Classes
And Requires all given ParseExpressions to be found in the given order.
CaselessLiteral Token to match a specified string, ignoring case of letters.
CharsNotIn Token for matching words composed of characters *not* in a given set.
Combine Converter to concatenate all matching tokens to a single string.
Dict Converter to return a repetitive expression as a list, but also as a dictionary.
Empty An empty token, will always match.
FollowedBy Lookahead matching of the given parse expression.
Forward Forward declaration of an expression to be defined later - used for recursive grammars, such as algebraic infix notation.
GoToColumn Token to advance to a specific column of input text; useful for tabular report scraping.
Group Converter to return the matched tokens as a list - useful for returning tokens of ZeroOrMore and OneOrMore expressions.
Keyword Token to exactly match a specified string as a keyword, that is, it must be immediately followed by a non-keyword character.
LineEnd Matches if current position is at the end of a line within the parse string
LineStart Matches if current position is at the beginning of a line within the parse string
Literal Token to exactly match a specified string.
MatchFirst Requires that at least one ParseExpression is found.
NoMatch A token that will never match.
NotAny Lookahead to disallow matching with the given parse expression.
OneOrMore Repetition of one or more of the given expression.
Optional Optional matching of the given expression.
Or Requires that at least one ParseExpression is found.
ParseElementEnhance Abstract subclass of ParserElement, for combining and post-processing parsed tokens.
ParseExpression Abstract subclass of ParserElement, for combining and post-processing parsed tokens.
ParserElement Abstract base level parser element class.
ParseResults Structured parse results, to provide multiple means of access to the parsed data:
PositionToken  
SkipTo Token for skipping over all undefined text until the matched expression is found.
StringEnd Matches if current position is at the end of the parse string
StringStart Matches if current position is at the beginning of the parse string
Suppress Converter for ignoring the results of a parsed expression.
Token Abstract ParserElement subclass, for defining atomic matching patterns.
TokenConverter Abstract subclass of ParseExpression, for converting parsed results.
Upcase Converter to upper case all matching tokens.
White Special matching class for matching whitespace.
Word Token for matching words composed of allowed character sets.
ZeroOrMore Optional repetition of zero or more of the given expression.

Exceptions
ParseBaseException base exception class for all parsing runtime exceptions
ParseException exception thrown when parse expressions don't match class
ParseFatalException user-throwable exception thrown when inconsistent parse content is found; stops all parsing immediately
RecursiveGrammarException exception thrown by validate() if the grammar could be improperly recursive

Function Summary
  _expanded(p)
  col(loc, strg)
Returns current column within a string, counting newlines as line separators The first column is number 1.
  delimitedList(expr, delim, combine)
Helper to define a delimited list of expressions - the delimiter defaults to ','.
  dictOf(key, value)
Helper to easily and clearly define a dictionary by specifying the respective patterns for the key and value.
  line(loc, strg)
Returns the line of text containing loc within a string, counting newlines as line separators The first line is number 1.
  lineno(loc, strg)
Returns current line number within a string, counting newlines as line separators The first line is number 1.
  makeHTMLTags(tagStr)
Helper to construct opening and closing tag expressions for HTML, given a tag name
  makeXMLTags(tagStr)
Helper to construct opening and closing tag expressions for XML, given a tag name
  nullDebugAction(*args)
'Do-nothing' debug action, to suppress debugging output during parsing.
  oneOf(strs, caseless)
Helper to quickly define a set of alternative Literals, and makes sure to do longest-first testing when there is a conflict, regardless of the input order, but returns a MatchFirst for best performance.
  removeQuotes(s, l, t)
Helper parse action for removing quotation marks from parsed quoted strings.
  replaceWith(replStr)
Helper method for common parse actions that simply return a literal value.
  srange(s)
Helper to easily define string ranges for use in Word construction.

Variable Summary
str __author__ = 'Paul McGuire <ptmcg@users.sourceforge.net>...
str __version__ = '1.3.1'
str __versionTime__ = '10 June 2005 08:43'
str alphanums = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQ...
str alphas = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRST...
str alphas8bit = '\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\x...
And commaSeparatedList = commaSeparatedList
Combine cStyleComment = cStyleComment enclosed in /* ... */
Combine dblQuotedString = string enclosed in double quotes
Empty empty = empty
Combine htmlComment = htmlComment enclosed in <!-- ... -->
str nums = '0123456789'
str printables = '0123456789abcdefghijklmnopqrstuvwxyzABCDEF...
MatchFirst quotedString = quotedString using single or double quote...
Optional restOfLine = rest of line up to \n
Combine sglQuotedString = string enclosed in single quotes

Function Details

col(loc, strg)

Returns current column within a string, counting newlines as line separators The first column is number 1.

delimitedList(expr, delim=',', combine=False)

Helper to define a delimited list of expressions - the delimiter defaults to ','. By default, the list elements and delimiters can have intervening whitespace, and comments, but this can be overridden by passing 'combine=True' in the constructor. If combine is set to True, the matching tokens are returned as a single token string, with the delimiters included; otherwise, the matching tokens are returned as a list of tokens, with the delimiters suppressed.

dictOf(key, value)

Helper to easily and clearly define a dictionary by specifying the respective patterns for the key and value. Takes care of defining the Dict, ZeroOrMore, and Group tokens in the proper order. The key pattern can include delimiting markers or punctuation, as long as they are suppressed, thereby leaving the significant key text. The value pattern can include named results, so that the Dict results can include named token fields.

line(loc, strg)

Returns the line of text containing loc within a string, counting newlines as line separators The first line is number 1.

lineno(loc, strg)

Returns current line number within a string, counting newlines as line separators The first line is number 1.

makeHTMLTags(tagStr)

Helper to construct opening and closing tag expressions for HTML, given a tag name

makeXMLTags(tagStr)

Helper to construct opening and closing tag expressions for XML, given a tag name

nullDebugAction(*args)

'Do-nothing' debug action, to suppress debugging output during parsing.

oneOf(strs, caseless=False)

Helper to quickly define a set of alternative Literals, and makes sure to do longest-first testing when there is a conflict, regardless of the input order, but returns a MatchFirst for best performance.

removeQuotes(s, l, t)

Helper parse action for removing quotation marks from parsed quoted strings. To use, add this parse action to quoted string using:
 quotedString.setParseAction( removeQuotes )

replaceWith(replStr)

Helper method for common parse actions that simply return a literal value. Especially useful when used with transformString().

srange(s)

Helper to easily define string ranges for use in Word construction. Borrows syntax from regexp '[]' string range definitions:
  srange("[0-9]")   -> "0123456789"
  srange("[a-z]")   -> "abcdefghijklmnopqrstuvwxyz"
  srange("[a-z$_]") -> "abcdefghijklmnopqrstuvwxyz$_"
The input string must be enclosed in []'s, and the returned string is the expanded character set joined into a single string. The values enclosed in the []'s may be:
  a single character
  an escaped character with a leading backslash (such as \- or \])
  an escaped hex character with a leading '\0x' (\0x21, which is a '!' character)
  an escaped octal character with a leading '\0' (\041, which is a '!' character)
  a range of any of the above, separated by a dash ('a-z', etc.)
  any combination of the above ('aeiouy', 'a-zA-Z0-9_$', etc.)

Variable Details

__author__

Type:
str
Value:
'Paul McGuire <ptmcg@users.sourceforge.net>'                           

__version__

Type:
str
Value:
'1.3.1'                                                                

__versionTime__

Type:
str
Value:
'10 June 2005 08:43'                                                   

alphanums

Type:
str
Value:
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'       

alphas

Type:
str
Value:
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'                 

alphas8bit

Type:
str
Value:
'\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\\
xd1\xd2\xd3\xd4\xd5\xd6\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe\
3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\\
xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xfe'                                   

commaSeparatedList

Type:
And
Value:
commaSeparatedList                                                     

cStyleComment

Type:
Combine
Value:
cStyleComment enclosed in /* ... */                                    

dblQuotedString

Type:
Combine
Value:
string enclosed in double quotes                                       

empty

Type:
Empty
Value:
empty                                                                  

htmlComment

Type:
Combine
Value:
htmlComment enclosed in <!-- ... -->                                   

nums

Type:
str
Value:
'0123456789'                                                           

printables

Type:
str
Value:
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\\
'()*+,-./:;<=>?@[\\]^_`{|}~'                                           

quotedString

Type:
MatchFirst
Value:
quotedString using single or double quotes                             

restOfLine

Type:
Optional
Value:
rest of line up to \n                                                  

sglQuotedString

Type:
Combine
Value:
string enclosed in single quotes                                       

Generated by Epydoc 2.1 on Sun Jun 12 22:30:07 2005 http://epydoc.sf.net