PROGRAMS BY wikipedia2xml.sf.net

  • wikipedia2XML Free

    A collection of python scripts to create and handle an XML corpus (a large collection of text for linguistic purpose) from an original Wikipedia database backup dump. It includes a regular expression based parser for the Medi