This project is read-only.

How to read pptx files?

Oct 11, 2010 at 4:46 PM

Hi there,

i want to use PHPPowerPoint to read properties like slide titles from a powerpoint file.

I tried loading a file in the following way:

/** Error reporting */
error_reporting(E_ALL);

/** Include path **/
set_include_path(get_include_path() . PATH_SEPARATOR . '../Classes/');

/** PHPPowerPoint */
include 'PHPPowerPoint.php';

/** PHPPowerPoint_IOFactory */
include 'PHPPowerPoint/IOFactory.php';

echo date('H:i:s') . " Load from PowerPoint2007 file\n";
$objPHPPowerPoint = PHPPowerPoint_IOFactory::load("01simple.pptx");

But i get the following output:

15:43:01 Load from PowerPoint2007 file 
Fatal error: Uncaught exception 'Exception' with message 'Could not automatically determine PHPPowerPoint_Reader_IReader for file.' in /Applications/XAMPP/xamppfiles/htdocs/5/Classes/PHPPowerPoint/IOFactory.php:185 Stack trace: #0 /Applications/XAMPP/xamppfiles/htdocs/5/Tests/08reader.php(43): PHPPowerPoint_IOFactory::load('01simple.pptx') #1 {main} thrown in /Applications/XAMPP/xamppfiles/htdocs/5/Classes/PHPPowerPoint/IOFactory.php on line 185

Can you please help me handling this problem?

Thanks a lot,

Martin

Oct 12, 2010 at 10:44 AM

There is currently no reader for PowerPoint, only writing files is possible.

Oct 12, 2010 at 2:00 PM

Ah okay... Thanks anyway :)

Can you make a rough estimate when there will be a reader available?

Oct 12, 2010 at 3:33 PM

That's a difficult one :-) Sorry, no estimate on that yet as we have limited resources for doing development...

Dec 13, 2011 at 10:52 AM

hello,

i m working also in this topic .... any regarding help???

Feb 3, 2012 at 9:40 AM

i have got  somthing for reading powpoint2007.

 

Feb 3, 2012 at 3:38 PM

and what that will be?

I am looking for something like that for a while...

Feb 6, 2012 at 9:50 AM

<?php
/**
 * PHPPowerPoint
 *
 * Copyright (c) 2009 - 2010 PHPPowerPoint
 *
 /** PHPPowerPoint_Shared_String */
require_once 'PHPPowerPoint/Slide.php';

/** PHPPowerPoint_Shared_XMLWriter */
require_once 'PHPPowerPoint/Shared/Drawing.php';
/** PHPPowerPoint_Shape_BaseDrawing */
require_once 'PHPPowerPoint/Shape/BaseDrawing.php';

/** PHPPowerPoint_Shape_Drawing */
require_once 'PHPPowerPoint/Shape/Drawing.php';

/**
 * PHPPowerPoint_Writer_PowerPoint2007
 *
 * @category   PHPPowerPoint
 * @package    PHPPowerPoint_Reader_PowerPoint2007
 * @copyright  Copyright (c) 2009 - 2010 PHPPowerPoint (http://www.codeplex.com/PHPPowerPoint)
 */
class PHPPowerPoint_Reader_PowerPoint2007 implements PHPPowerPoint_Reader_IReader
{
 public function canRead($pFilename)
 {
  // Check if zip class exists
  if (!class_exists('ZipArchive')) {
   return false;
  }

  // Check if file exists
  if (!file_exists($pFilename)) {
   throw new Exception("Could not open " . $pFilename . " for reading! File does not exist.");
  }

  $ppt = false;
  // Load file
  $zip = new ZipArchive;
  if ($zip->open($pFilename) === true) {
   // check if it is an OOXML archive
   $rels = simplexml_load_string($this->_getFromZipArchive($zip, "_rels/.rels"));
   foreach ($rels->Relationship as $rel) {
    switch ($rel["Type"]) {
     case "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument":
      if (basename($rel["Target"]) == 'presentation.xml') {
       $ppt = true;
      }
      break;

    }
   }
   $zip->close();
  }

  return $ppt;
 }
 public function _getFromZipArchive(ZipArchive $archive, $fileName = '')
 {

  
  // Root-relative paths
  if (strpos($fileName, '//') !== false)
  {
   $fileName = substr($fileName, strpos($fileName, '//') + 1);
  }
  //
  //$fileName = PHPPowerPoint_Shared_File::realpath($fileName);
  
  // Apache POI fixes
  $contents = $archive->getFromName($fileName);
  if ($contents === false)
  {
   $contents = $archive->getFromName(substr($fileName, 1));
  }
  /*
  if (strpos($contents, '<?xml') !== false && strpos($contents, '<?xml') !== 0)
  {
   $contents = substr($contents, strpos($contents, '<?xml'));
  }
  echo "<br>********************************************<br>";
  var_dump($fileName);
  var_dump($contents);
  echo "<br>********************************************<br>";
  
  echo "<br>********************************************<br>";
  print_r($contents);
  echo "<br>********************************************<br>";
  */
  return $contents;
 }
 private static function array_item($array, $key = 0) {
  return (isset($array[$key]) ? $array[$key] : null);
 }

 private static function dir_add($base, $add) {
  return preg_replace('~[^/]+/\.\./~', '', dirname($base) . "/$add");
 }

 public function load($pFilename)
 { 
  // Check if file exists
  if (!file_exists($pFilename)) {
   throw new Exception("Could not open " . $pFilename . " for reading! File does not exist.");
  }

  // Initialisations
  $powerpoint = new PHPPowerPoint();
  $powerpoint->removeSlideByIndex(0);
  
  $zip = new ZipArchive;
  $zip->open($pFilename);
  
  $rels = simplexml_load_string($this->_getFromZipArchive($zip, "_rels/.rels")); //~ http://schemas.openxmlformats.org/package/2006/relationships");
  //print_r( $rels);
  foreach ($rels->Relationship as $rel)
  {
  
   switch ($rel["Type"])
   { 
    case "http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties":
     $dir = dirname($rel["Target"]); 
     $xmlCore = simplexml_load_string($this->_getFromZipArchive($zip, $rel['Target'] ));
     if (is_object($xmlCore)) {
      
      $xmlCore->registerXPathNamespace("dc", "http://purl.org/dc/elements/1.1/");
      $xmlCore->registerXPathNamespace("dcterms", "http://purl.org/dc/terms/");
      $xmlCore->registerXPathNamespace("cp", "http://schemas.openxmlformats.org/package/2006/metadata/core-properties");
      $docProps = $powerpoint->getProperties();
      $docProps->setCreator((string) self::array_item($xmlCore->xpath("dc:creator")));
      $docProps->setLastModifiedBy((string) self::array_item($xmlCore->xpath("cp:lastModifiedBy")));
      $docProps->setCreated(strtotime(self::array_item($xmlCore->xpath("dcterms:created")))); //! respect xsi:type
      $docProps->setModified(strtotime(self::array_item($xmlCore->xpath("dcterms:modified")))); //! respect xsi:type
      $docProps->setTitle((string) self::array_item($xmlCore->xpath("dc:title")));
      $docProps->setDescription((string) self::array_item($xmlCore->xpath("dc:description")));
      $docProps->setSubject((string) self::array_item($xmlCore->xpath("dc:subject")));
      $docProps->setKeywords((string) self::array_item($xmlCore->xpath("cp:keywords")));
      $docProps->setCategory((string) self::array_item($xmlCore->xpath("cp:category")));
     
     }
    break;
    
    case "http://schemas.openxmlformats.org/officeDocument/2006/relationships/extended-properties ":
     $xmlApp = simplexml_load_string($this->_getFromZipArchive($zip, $rel['Target'])); 
     if (is_object($xmlApp)) {
     //print_r( $xmlApp);

     }
    
    break;
    case "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument":
     $dir = dirname($rel["Target"]);
     $pptpresentation = simplexml_load_string($this->_getFromZipArchive($zip, $rel["Target"] )); 
     //echo $rel["Target"];
     $pptpresentation->registerXPathNamespace("a", "http://schemas.openxmlformats.org/drawingml/2006/main");
     $pptpresentation->registerXPathNamespace("r", "http://schemas.openxmlformats.org/officeDocument/2006/relationships");
     $pptpresentation->registerXPathNamespace("p", "http://schemas.openxmlformats.org/presentationml/2006/main");
     //echo self::array_item($pptpresentation->xpath("p:sldIdLst/sldId"));

     $slides = array();
     foreach ($relsPresentation->Relationship as $ele) {
      if ($ele["Type"] == "http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide") {
       $slides[(string) $ele["Id"]] = $ele["Target"];
      }
     }
    break;
   }
  }
  $Content_Types = simplexml_load_string($this->_getFromZipArchive($zip, "[Content_Types].xml")); //~ http://schemas.openxmlformats.org/package/2006/relationships");
  $i =0;  
  foreach ($Content_Types->Override as $Override)
  {
  
   switch ($Override["ContentType"])
   { 
    case "application/vnd.openxmlformats-officedocument.presentationml.slide+xml":
    //print_r ($Override);
    //生成一个slide    
    $dir = dirname($Override["PartName"]);
    $currentSlide = $powerpoint->createSlide();
        
        
    $slides = simplexml_load_string($this->_getFromZipArchive($zip, $Override["PartName"]));
    $slides->registerXPathNamespace("a", "http://schemas.openxmlformats.org/drawingml/2006/main");
    $slides->registerXPathNamespace("r", "http://schemas.openxmlformats.org/officeDocument/2006/relationships");
    $slides->registerXPathNamespace("p", "http://schemas.openxmlformats.org/presentationml/2006/main");
    
    //图片
    /*
         
    //读取一个slide.xml.rels
    $media = array();
    $slidedetail = simplexml_load_string($this->_getFromZipArchive($zip, $dir."/_rels/".basename($Override["PartName"]).".rels"));
    $j = 0;
    //图片
    foreach ($slidedetail->Relationship as $Relationship)
    {
     
     switch ($Relationship["Type"])
     {
      case "http://schemas.openxmlformats.org/officeDocument/2006/relationships/image":
      //echo $Relationship["Target"];
      $images = array();      
      $images['Target']=(string)$Relationship["Target"];
      //echo ($Relationship["Target"]);
      $images['Id']=(string)$Relationship["Id"];
      $media[$i][$j] = $images;
      $i++;
      break;
     }
     
    }
    */

    $slidepic =$slides->xpath("p:cSld/p:spTree/p:pic");
    
    foreach ($slidepic as $pic )
    {
      //name
      $cNvPr=$pic->xpath("p:nvPicPr/p:cNvPr");
      $picLocks=$pic->xpath("p:cNvPicPr/a:picLocks");
      $exts=$pic->xpath("p:spPr/a:xfrm/a:ext");
      $off=$pic->xpath("p:spPr/a:xfrm/a:off");
      $outerShdw =$pic->xpath("p:spPr/a:effectLst/a:outerShdw");
      //图片在rels的ID 并以此找到在pptx的路径
      $blip=$pic->xpath("p:blipFill/a:blip");
      
      $rID = $blip[0]->attributes('r',true)->embed;
      //echo $rID;
      $slidedetail  = simplexml_load_string($this->_getFromZipArchive($zip, $dir."/_rels/".basename($Override["PartName"]).".rels"));
      $slidedetail->registerXPathNamespace("a", "http://schemas.openxmlformats.org/package/2006/relationships");
      $imagePath =$slidedetail->xpath("a:Relationship[@Id='$rID']");

     
      //echo ($imagePath[0]->attributes()->Target)."<br>*****************111***************************<br>";;
      
      $shape = $currentSlide->createDrawingShape();
      echo "ppt".substr(dirname($imagePath[0]->attributes()->Target), 2)."/".basename($imagePath[0]->attributes()->Target);
      
      $shape->setName($cNvPr[0]->attributes()->name);
      $shape->setDescription($cNvPr[0]->attributes()->descr);
      $zip->extractTo("./images/" , "ppt".substr(dirname($imagePath[0]->attributes()->Target), 2)."/".basename($imagePath[0]->attributes()->Target));     
      //$shape->setPath("./images/".basename($imagePath[0]->attributes()->Target));
      $shape->setPath("./images/"."ppt".substr(dirname($imagePath[0]->attributes()->Target), 2)."/".basename($imagePath[0]->attributes()->Target));
      
      $shape->setHeight(PHPPowerPoint_Shared_Drawing::EMUTopixels((string)self::array_item($exts[0]->attributes()->cy)));
      $shape->setWidth(PHPPowerPoint_Shared_Drawing::EMUTopixels((string)self::array_item($exts[0]->attributes()->cx)));
      $shape->setOffsetX(PHPPowerPoint_Shared_Drawing::EMUTopixels((string)self::array_item($off[0]->attributes()->x)));
      $shape->setOffsetY(PHPPowerPoint_Shared_Drawing::EMUTopixels((string)self::array_item($off[0]->attributes()->y)));
      if (is_object($outerShdw)) {
      //$shape->setRotation(25);
      $shape->getShadow()->setVisible(true);
      $shape->getShadow()->setDirection(PHPPowerPoint_Shared_Drawing::EMUTopixels((string)self::array_item($outerShdw[0]->attributes()->dir)));
      $shape->getShadow()->setDistance(PHPPowerPoint_Shared_Drawing::EMUTopixels((string)self::array_item($outerShdw[0]->attributes()->dist)));
      $shape->getAlignment()->setHorizontal( (string)self::array_item($outerShdw[0]->attributes()->algn));
      }
/**/
    }

    //文字
    $slidesp =$slides->xpath("p:cSld/p:spTree/p:sp");///p:txBody/a:p/a:r/a:t");
    foreach ($slidesp as $sp )
    {
     $exts=$sp->xpath("p:spPr/a:xfrm/a:ext");
     $rPr=$sp->xpath("p:txBody/a:p/a:r/a:rPr");
     $srgbClr =$sp->xpath("p:txBody/a:p/a:r/a:rPr/a:solidFill/a:srgbClr");
     $name =$sp->xpath("p:txBody/a:p/a:r/a:rPr/a:latin");
     $off=$sp->xpath("p:spPr/a:xfrm/a:off");
     $pPr=$sp->xpath("p:txBody/a:p/a:pPr");
     
     $shape = $currentSlide->createRichTextShape();
     //$shape->setHeight((string)self::array_item($exts[0]->attributes()->cy)/9525);
     
     $shape->setHeight(PHPPowerPoint_Shared_Drawing::EMUTopixels((string)self::array_item($exts[0]->attributes()->cy)));
     $shape->setWidth(PHPPowerPoint_Shared_Drawing::EMUTopixels((string)self::array_item($exts[0]->attributes()->cx)));
     $shape->setOffsetX(PHPPowerPoint_Shared_Drawing::EMUTopixels((string)self::array_item($off[0]->attributes()->x)));
     $shape->setOffsetY(PHPPowerPoint_Shared_Drawing::EMUTopixels((string)self::array_item($off[0]->attributes()->y)));
     //echo "<br>*****************111***************************<br>";
     $shape->getAlignment()->setHorizontal( (string)self::array_item($pPr[0]->attributes()->algn));
 
     $textRun = $shape->createTextRun((string)self::array_item($sp->xpath ("p:txBody/a:p/a:r/a:t")));
     $textRun->getFont()->setBold(self::array_item($rPr[0]->attributes()->b));
     $textRun->getFont()->setSize((string)self::array_item($rPr[0]->attributes()->sz)/100);

     if (self::array_item($rPr[0]->attributes()->i) == 'false')
      $textRun->getFont()->setItalic(false);
     else
      $textRun->getFont()->setItalic(true);

     $textRun->getFont()->setUnderline(self::array_item($rPr[0]->attributes()->u));
     $textRun->getFont()->setName((string)self::array_item($name[0]->attributes()->typeface));

     $textRun->getFont()->setColor(new PHPPowerPoint_Style_Color( 'FF'.(string)self::array_item($srgbClr[0]->attributes()->val)));
     //echo (self::array_item($srgbClr[0]->attributes()->val));

    }    
    
    echo "<br>==============================================<br>";
    
    break; 
   }
  }
  $zip->close();

  return $powerpoint;
 }
}

Feb 6, 2012 at 9:54 AM

 i will do somthing for Word2007.php

Feb 22, 2012 at 4:41 AM

can u help me plz ... the above code is not working for me .... plz help me out with thi

Apr 18, 2012 at 12:31 AM

Thanks for the great code, exiwin.

I took exiwin's code from above, fixed a few minor bugs in it, and added support for a few simple elements that PowerPoint generates, then submitted it as a patch to PHPPowerPoint.  Until it is integrated you can find it in patch #11996 at http://phppowerpoint.codeplex.com/SourceControl/list/patches

It supports reading pptx files generated by PHPPowerPoint, so far as I have tested, and has rudimentary support for files generated by PowerPoint.  PHPPowerPoint lacks support for much of the pptx format, such as SlideLayouts and SlideMasters, but you may have good results with a simple black-text presentation.  Solid background colors (and maybe background images (I didn't test)) are converted to layout objects, so there is hefty conversion going on, but it will hopefully still look the same.  Part of the reason for favoring conversion is to keep PHPPowerPoint's underlying object structure intact and preserve the possibility of importing pptx and exporting odp.

A test was added to the test suite to demonstrate and validate reading pptx files.

My plans from here are to add SlideLayout and SlideMaster support to PHPPowerPoint, and support for reading them out of pptx files, since without these I cannot approach my goal of reprocessing/reformatting a library of existing PowerPoint files that I have.

Aug 15, 2012 at 5:48 PM

Thanks for this.

I am looking for a very simple word count in power point files.

The functionality should be that an external ppt or pptx file is loaded and then either the metadata.statistics.wordcount is read and output or that all words in the PHPPowerPoint object are counted.

I am having a tough time finding anything on phppowerpoint.codeplex.com or even anywhere else on the web that does that.

thx,

Martin

Aug 17, 2012 at 1:44 PM

Hi Martin,

The PHPPowerPoint software is not designed to be a full-feature pptx library but rather a basic generic presentation stack creator.  The conceptual design of the pptx format is largely absent from the software, replaced with an alternate design concept that works quite well for generating presentation stacks that can be exported to multiple formats.  If full import of the pptx format were desired, it would only make sense to start from scratch and do a complete rewrite.

That said, a pptx file is actually just a zip file with its .zip changed to .pptx.  Try changing the filename and the unzipping the file and have a look around the xml files contained within.  It sounds like you have a good idea with the metadata.statistics.wordcount, so dig around and see if you can figure out where they store that property.  These are all human-readable files, so they should be moderately approachable even if they are in need of a good xml formatter (see xmllint for that).  You could probably develop a small single-purpose tool to extract this property from the files with a small amount of work.

Good luck!

Aug 17, 2012 at 4:34 PM
Thx a lot - I figured it out via unzipping and then reading app.xml file brilliant!
If you need code let me know
Cheers and thx
Martin

Sent from my iPhone

On Aug 17, 2012, at 2:44 PM, catrane <notifications@codeplex.com> wrote:

From: catrane

Hi Martin,

The PHPPowerPoint software is not designed to be a full-feature pptx library but rather a basic generic presentation stack creator. The conceptual design of the pptx format is largely absent from the software, replaced with an alternate design concept that works quite well for generating presentation stacks that can be exported to multiple formats. If full import of the pptx format were desired, it would only make sense to start from scratch and do a complete rewrite.

That said, a pptx file is actually just a zip file with its .zip changed to .pptx. Try changing the filename and the unzipping the file and have a look around the xml files contained within. It sounds like you have a good idea with the metadata.statistics.wordcount, so dig around and see if you can figure out where they store that property. These are all human-readable files, so they should be moderately approachable even if they are in need of a good xml formatter (see xmllint for that). You could probably develop a small single-purpose tool to extract this property from the files with a small amount of work.

Good luck!

Aug 29, 2012 at 8:00 AM

Hi

Please guide me how do I use this library in my html code?

Sep 24, 2013 at 8:25 AM
Hi mdiessner,
      Can you please give code for read pptx file. Thanks in advance.
Sep 24, 2013 at 8:27 AM
Hi mdiessner,
   Can you please give code for read pptx file. Please help on this. Thanks on advance.

Thanks.
Sep 24, 2013 at 8:29 AM
hi catrane,
   Can you please give code for read pptx file. And please send where will paste that reader.php.
Please help on this. thanks in advance.
Sep 24, 2013 at 7:31 PM
Here we go
<?php
function delete_directory($dir) {
    if ($handle = opendir($dir)) {
        $array = array();
        while (false !== ($file = readdir($handle))) {
            if ($file != "." && $file != "..") {
                if(is_dir($dir.$file)) {
                    if(!@rmdir($dir.$file)) {
                        delete_directory($dir.$file.'/'); 
                    }
                }
                else {
                 @unlink($dir.$file);
                }
            }
        }
        closedir($handle);
        @rmdir($dir);
    }
}

//check if form submitted a file
if(isset($_POST['file_upload'])) {
    $target_path = TEMPLATEPATH.'/uploads/';    
    $target_path = $target_path . basename( $_FILES['uploadedfile']['name']);

    //upload the file
    if(move_uploaded_file($_FILES['uploadedfile']['tmp_name'], $target_path)) {
        $file_name = $target_path;
    } else {
        echo "There was an error uploading the file to ".$target_path.", please try again!";
    }
    
    //check if the file can be opened - maybe need to change the Apache max file upload settings
    $file_handle = fopen($file_name, "r");  
    if ($file_handle == FALSE) {
        echo 'Error on fopen('.$file_name.')';
    } else {    
        //rename the pptx file to zip and then unzip    
        $path = pathinfo(realpath($file_name));     
        $path = $path['dirname'].'/'.rand(10000000, 99999999);
        $zip = new ZipArchive;
        $res = $zip->open($file_name);  
        if($res == true) {
            $zip->extractTo($path);
            $zip->close();

            //in unzipped file read /docProps/app.xml and look for <Words>4</Words>
            $xml = simplexml_load_file($path.'/docProps/app.xml');
            echo $xml->getName() . "<br />";

            //look for words metadata
            foreach($xml->children() as $child) {
                if($child->getName()=='Words') {
                    $words = $child;
                    break;
                }
            }
            
            //delete directory
            $d = delete_directory($path.'/');           
            
            //return words
            echo $words;
            
        } else {
            echo 'Could not unzip file';
        }
    }
    //delete file uploaded and unzipped directory
    fclose($file_handle);
    unlink($file_name);     
}
?>
    
    
<form enctype="multipart/form-data" action="<?php echo $_SERVER["REQUEST_URI"]; ?>" method="POST">
<input type="hidden" name="file_upload" id="file_upload" value="true" />
<input type="hidden" name="MAX_FILE_SIZE" value="1000000" />
<input name="uploadedfile" id="uploadedfile" type="file" onchange="this.form.submit();" style="width: 0px;height: 0px;border-size: 0px;border-style: none;padding: 0px;visibility:hidden;"/>                
<a href="#" onclick ="javascript:document.getElementById('uploadedfile').click();">FILE</a>
</form>
    
Sep 25, 2013 at 10:18 AM
Hi mdiessner,
   Thanks for sent your code.I tried your code. It is not working for me. When i upload zip file, ziparchive is working . But ziparchive not working for pptx file. I don't know why. If any needs to extend ziparchive for pptx, please tell me. please help on this.

Thanks.
Sep 25, 2013 at 10:23 AM
Thx. Sorry I am on holidays now for one month and don't have time to look into this

Sent from my iPhone