A SAX handler that exports revisions metadata from a Wikipedia-stub-dump file into a single xml file.
More...
Inherits DefaultHandler.
Private Attributes |
File | exportFile |
| the export file to export to
|
DocumentBuilderFactory | docFactory |
| the document builder factory used to create the DOM document builder
|
DocumentBuilder | docBuilder |
| the document builder used to create the DOM document
|
Document | currentDoc |
| the current DOM document instance.
|
Element | currentRoot |
| the current root element instance.
|
Stack< Element > | elements = new Stack<Element>() |
| a stack containing the current parent element.
|
boolean | export = false |
| indicates if the parser is inside a page element that should be exported
|
boolean | inTitle = false |
| indicates if the parser is inside a title tag
|
boolean | inRevision = false |
| indicates if the parser is inside a revision tag
|
String | pageTitle = "" |
| the title of the page which should be exported
|
Detailed Description
A SAX handler that exports revisions metadata from a Wikipedia-stub-dump file into a single xml file.
This Handler throws an SAXException with the message "Finished extraction" after the page has been found and exported.
- See Also
- https://meta.wikimedia.org/wiki/Data_dumps/Dump_format
- Author
- Florian Zoubek zoube.nosp@m.k@bi.nosp@m.tanda.nosp@m.rt.a.nosp@m.t
Constructor & Destructor Documentation
ExtractPageSAXHandler.ExtractPageSAXHandler |
( |
File |
exportFile, |
|
|
String |
pageTitle |
|
) |
| throws ParserConfigurationException |
Creates a new ExtractPageSAXHandler
instance.
For details and behavior of this handler see the class description.
- Parameters
-
exportFile | the file to export to |
pageTitle | the title of the page which should be exported |
- Exceptions
-
ParserConfigurationException | |
Member Function Documentation
void ExtractPageSAXHandler.characters |
( |
char[] |
ch, |
|
|
int |
start, |
|
|
int |
length |
|
) |
| throws SAXException |
void ExtractPageSAXHandler.endDocument |
( |
| ) |
throws SAXException |
void ExtractPageSAXHandler.endElement |
( |
String |
uri, |
|
|
String |
localName, |
|
|
String |
qName |
|
) |
| throws SAXException |
void ExtractPageSAXHandler.startDocument |
( |
| ) |
throws SAXException |
void ExtractPageSAXHandler.startElement |
( |
String |
uri, |
|
|
String |
localName, |
|
|
String |
qName, |
|
|
Attributes |
attributes |
|
) |
| throws SAXException |
Member Data Documentation
Document ExtractPageSAXHandler.currentDoc |
|
private |
the current DOM document instance.
null
until the page has been found.
Element ExtractPageSAXHandler.currentRoot |
|
private |
the current root element instance.
null
until the page has been found.
DocumentBuilder ExtractPageSAXHandler.docBuilder |
|
private |
the document builder used to create the DOM document
DocumentBuilderFactory ExtractPageSAXHandler.docFactory |
|
private |
the document builder factory used to create the DOM document builder
Stack<Element> ExtractPageSAXHandler.elements = new Stack<Element>() |
|
private |
a stack containing the current parent element.
boolean ExtractPageSAXHandler.export = false |
|
private |
indicates if the parser is inside a page element that should be exported
File ExtractPageSAXHandler.exportFile |
|
private |
the export file to export to
boolean ExtractPageSAXHandler.inRevision = false |
|
private |
indicates if the parser is inside a revision
tag
boolean ExtractPageSAXHandler.inTitle = false |
|
private |
indicates if the parser is inside a title
tag
String ExtractPageSAXHandler.pageTitle = "" |
|
private |
the title of the page which should be exported
The documentation for this class was generated from the following file: