Archive:Word format

From semantic-mediawiki.org
Word format
Outputs the result in Microsoft Office Word file format (doc/docx).
Further Information
Provided by: Extension "Semantic Result Formats"
Added:
Removed:
Requirements: MW 1.21+
"PhpOffice/PhpWord" library)
or

MW 1.22+
"PhpWord" library (handled by Composer)

Format name: word
Enabled? Indicates whether the result format is enabled by default upon installation of the respective extension. yes
Authors: Wolfgang Fahl
Categories: export
Table of Contents
Since July 22, 2015 this format is proposed as an enhancement request and only available via a fork from GitHub. See Semantic Result Formats: GitHub issue 114 comment no – Add MS Word format

The result format word is used to format query results as a word file.

If the PHPWord library (required) is installed this format will automatically be available (SRF ≥ 1.9.1).

Parameters[edit]

  • templatefile - the name of a docx word file containing ${needle} placeholders. It is automatically searched for in the File: namespace.

Example[edit]

This result format is not available on this wiki. Thus an example output cannot be provided.

Example[edit]

We'd like to get a table of cities that have a population of more than 1 million people sorted by population

{{#ask:
 [[Category:City]]
 [[Population::>1000000]]
 |?Population
 |sort=Population
}}
Same query for Word format
{{#ask:
 [[Category:City]]
 |?Population
 |searchlabel=Download result as Word file
 |templatefile=GermanCities.docx
 |format=word
}}

Preparing a Template File[edit]

The Template file needs to have ${needle} placeholders where the field results are to be inserted, e.g. ${population} would hold the population result.

Caveats[edit]

Unfortunately when saving Microsoft Word files extra characters might get inserted see:

See this issue on stackoverflow.com

To avoid this you might want to

  • switch off correction mode (which might add red markups)
  • use cut&paste in a formatless mode

You might want to check that the needles $ { … } where not spoiled in the resulting Docx xml format. You can check this by unzipping the docx file and looking into the word/document.xml file.

A Tool like xmlstarlet can help with doing this.

Here is a few lines of bash script as an example

unzip -o GermanCities.docx
for keyword in population 
do
 xmlstarlet fo word/document.xml | grep $keyword
done

The result should look like:

  <w:t>${population}</w:t>
…

As a script "caveat" this looks like:

#!/bin/bash
#   Copyright (C) 2015 BITPlan GmbH
#   wf 2015-09-29
#   check that a word template is ok for being used with the
#   SMW word result format
#   see http://semantic-mediawiki.org/wiki/Help:Word_format

#
# show usage
#
usage() {
  echo "usage: $0 wordtemplatefile keywords"
  exit 1
}

# check command line parameters - there must be at least one
if [ $# -lt 2 ]
then
  usage
fi

file="$1"
keywords="$2"
if [ ! -f $file ]
then
  echo "$file does not exist" 1>&2
	exit 1
else
  unzip -o $file > /dev/null
  for keyword in $keywords 
  do
    xmlstarlet fo word/document.xml | grep $keyword
  done
fi

Installation[edit]

This describes how to install the required PHPWord library with Composer, which is recommended method for MW 1.22+. Either enter the following in you command line:

composer require phpoffice/phpword dev-master

or add the following as the last line of the "require" section in your "composer.json" file:

"phpoffice/phpword": "dev-master"

NoteNote:  Replace the version number "dev-master" of this example with the version number you want to install at your convenience.

Patching TemplateProcessor.php for Image handling[edit]

If you'd like to insert Images into your word file you might want to patch the TemplateProcessor.php file of PhpOffice/PhpWord like this: see

The SRF_Word format will automatically detect that the method searchImageId is available and will use it.

neso:PhpWord wf$ rcsdiff TemplateProcessor.php 
===================================================================
RCS file: RCS/TemplateProcessor.php,v
retrieving revision 1.1
diff -r1.1 TemplateProcessor.php
61a62,68
>     
>     /**
>      * Content of document rels (in XML format) of the temporary document.
>      *
>      * @var string
>      */
>     private $temporaryDocumentRels; 
101a109
>         $this->temporaryDocumentRels = $this->zipClass->getFromName('word/_rels/document.xml.rels');
508a517,583
>     // 
>     // Image handling
>     // see http://stackoverflow.com/questions/24018003/how-to-add-set-images-on-phpoffice-phpword-template
>     // 
>  
>     /**
>      * Set a new image
>      *
>      * @param string $search
>      * @param string $replace
>      */
>  
>     public function setImageValue($search, $replace){
>         // Sanity check
>         if (!file_exists($replace))
>         {
>             return;
>         }
>  
>         // Delete current image
>         $this->zipClass->deleteName('word/media/' . $search);
>  
>         // Add a new one
>         $this->zipClass->addFile($replace, 'word/media/' . $search);
>     }
>  
>     /**
>      * Search for the labeled image's rId
>      *
>      * @param string $search
>      */
>  
>     public function searchImageId($search){
>         if (substr($search, 0, 2) !== '${' && substr($search, -1) !== '}') {
>             $search = '${' . $search . '}';
>         }
>         $tagPos = strpos($this->tempDocumentMainPart, $search);
>         $rIdStart = strpos($this->tempDocumentMainPart, 'r:embed="',$tagPos)+9;    
>         $rId=strstr(substr($this->tempDocumentMainPart, $rIdStart),'"', true);
>         return $rId;
>     }
>  
>     /**
>      * Get img filename with it's rId
>      *
>      * @param string $rId
>      */
>  
>     public function getImgFileName($rId){
>         $tagPos = strpos($this->temporaryDocumentRels, $rId);
>         $fileNameStart = strpos($this->temporaryDocumentRels, 'Target="media/',$tagPos)+14;
>         $fileName=strstr(substr($this->temporaryDocumentRels, $fileNameStart),'"', true);
>         return $fileName;
>     }
>  
>     /**
>      * set the image with the given searchAlt alternate text
>      * @param searchAlt - the alternate text to search for
>      * @param replace - the image filename to replace the image with that is found
>      */
>     public function setImageValueAlt($searchAlt, $replace){
>     	$_rid=$this->searchImageId($searchAlt);
>     	$_imagefile=$this->getImgFileName($_rid);
>       $this->setImageValue($_imagefile,$replace);
>     }
>     
>