wikipedia2XML 0.1

License: Free ‎File size: N/A
‎Users Rating: 3.0/5 - ‎1 ‎votes

A collection of python scripts to create and handle an XML corpus (a large collection of text for linguistic purpose) from an original Wikipedia database backup dump. It includes a regular expression based parser for the MediaWiki markup language.

VERSION HISTORY

  • Version 0.1 posted on 2008-04-01
    Several fixes and updates
  • Version 0.1 posted on 2008-04-01

Program Details