wikipedia2XML 0.1

License: Free ‎File size: N/A
‎Users Rating: 3.0/5 - ‎1 ‎votes

ABOUT wikipedia2XML

A collection of python scripts to create and handle an XML corpus (a large collection of text for linguistic purpose) from an original Wikipedia database backup dump. It includes a regular expression based parser for the MediaWiki markup language.