A collection of python scripts to create and handle an XML corpus (a large collection of text for linguistic purpose) from an original Wikipedia database backup dump. It includes a regular expression based parser for the MediaWiki markup language.
VERSION HISTORY
- Version 0.1 posted on 2008-04-01
Several fixes and updates - Version 0.1 posted on 2008-04-01
Program Details
- Category: Education > Other
- Publisher: wikipedia2xml.sf.net
- License: Free
- Price: N/A
- Version: 0.1
- Platform: windows