Safe Haskell | Safe-Infered |
---|
Text.XmlHtml.HTML.Parse
- docFragment :: Encoding -> Parser Document
- prolog :: Parser (Maybe DocType, [Node])
- docTypeDecl :: Parser DocType
- externalID :: Parser ExternalID
- data ElemResult
- = Matched
- | ImplicitLast Text
- | ImplicitNext Text Text [(Text, Text)] Bool
- finishElement :: Text -> Text -> [(Text, Text)] -> Bool -> Parser (Node, ElemResult)
- emptyOrStartTag :: Parser (Text, Text, [(Text, Text)], Bool)
- attrName :: Parser Text
- isControlChar :: Char -> Bool
- quotedAttrValue :: Parser Text
- unquotedAttrValue :: Parser Text
- attrValue :: Parser Text
- attribute :: Parser (Text, Text)
- endTag :: Text -> Parser ElemResult
- content :: Maybe Text -> Parser ([Node], ElemResult)
- reference :: Parser Text
- finishCharRef :: Parser Char
- finishEntityRef :: Parser Text
Documentation
docFragment :: Encoding -> Parser DocumentSource
HTML version of document fragment parsing rule It differs only in that
it parses the HTML version of content
and returns an HtmlDocument
.
docTypeDecl :: Parser DocTypeSource
Internal subset is parsed, but ignored since we don't have data types to store it.
data ElemResult Source
When parsing an element, three things can happen (besides failure):
- The end tag matches the start tag. This is a Matched.
- The end tag does not match, but the element has an end tag that can be omitted when there is no more content in its parent. This is an ImplicitLast. In this case, we need to remember the tag name of the end tag that we did find, so as to match it later.
- A start tag is found such that it implicitly ends the current element. This is an ImplicitNext. In this case, we parse and remember the entire element that comes next, so that it can be inserted after the element being parsed.
Constructors
Matched | |
ImplicitLast Text | |
ImplicitNext Text Text [(Text, Text)] Bool |
finishElement :: Text -> Text -> [(Text, Text)] -> Bool -> Parser (Node, ElemResult)Source
isControlChar :: Char -> BoolSource
From 8.2.2.3 of the HTML 5 spec, omitting the very high control characters because they are unlikely to occur and I got tired of typing.
endTag :: Text -> Parser ElemResultSource