PDF operations using Apache PDFBox without stings

Standard

PDF is a convenient way of printing the output and sharing the data with application users. Its ability to scale as per display without distorting the contents, is considered to be the strongest feature over other output formats. As per application requirements, it also becomes necessary to generate PDF at runtime. There are many options, available to do so. Either use reporting tools like Jasper and then using reporting API generate the output in PDF OR if you wish to have finer controls over PDF generation use programming API like iText. iText uses AGPL for Open Source, and it can be pretty fatal, if you really don’t understand its legal terms. Contact someone who understands Open Source licenses. Most of the time, developers forget to pay attention to the licensing model. Developers usually don’t understand what Open source really means, most of them feel its free like air they breath and ignorantly fall into the trap of License violation. Usage of any Open Source software without paying attention to License, can result into lawsuit against development company or hefty payments to the governing company of Open source software. Unfortunately the Illiteracy rate is way too high. Why, I am bragging about all of this, simply the reason is – if you are using iText, be ready to either Open Source your software or pay for use (Once again pay attention to details, don’t just buy any license). OK! so is there really any other option, than just paying bucks. Yes, there is and as usual, Apache is here for your rescue – Apache PDFBox.

The project released version 1.1.0 on March 2010 and since then there are numerous releases to support new features. As of now, PDFBox supports many useful features including

  • Extract Text from Existing PDF
  • Fill up forms
  • Generate new forms
  • Merge PDF
  • Add Images
  • Manipulate existing PDF
  • Add Security
  • Export PDF to Image
  • PDF/A standard checks (If you don’t know about this, it ok, but if you are building Enterprise application with Archiving requirements you should).
  • And there are many more…

Hello PDF World

Ok, enough of praise, lets fold the sleeves and check out few APIs. Create a directory, and place following contents inside pom.xml



	Student Class registry
	0.0.1
	4.0.0
	com.carbonrider.pdfbox.tutorial
	simple-pdf
	jar
	
		
			org.apache.pdfbox
			pdfbox
			2.0.3
		
	

Create src\main\java folder structure and then import the pom.xml into your favorite IDE – I am using IntelliJ IDEA 2016 community edition. Create new package com.carbonrider.pdfbox.tutorial.simple under java folder and then create a file SimplePDF.java. We will use SimplePDF to create first PDF. But before we proceed, let just verify the structure.

PDF Project Structure

PDF Project Structure

So far so good, lets add some code for action. Add following code to SimplePDF.java and Run the main method. Don’t worry about details for now, we will go through with each in details in a short time.

/**
 * 
 */
package com.carbonrider.pdfbox.tutorial.simple;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.font.PDType1Font;

import java.io.File;

/**
 * @author Yogesh
 *
 */
public class SimplePDF {

/**
 * @param args
 */
public static void main(String[] args) throws Exception {
	PDDocument helloPdf = new PDDocument();
	PDPage page = new PDPage(PDRectangle.A4);
	helloPdf.addPage(page);

	PDPageContentStream contentStream = new PDPageContentStream(helloPdf, page);
	contentStream.beginText();
	contentStream.setFont(PDType1Font.HELVETICA_BOLD, 10);
	contentStream.newLineAtOffset(10, 100);
	contentStream.showText("Hello PDF World!!!");
	contentStream.endText();

	contentStream.close();
	helloPdf.save(new File("D:\\temp\\simple.pdf"));
	helloPdf.close();
}

}

We just created our first PDF and you can definitely open it in any PDF reader software. Go ahead, have a look… Could you see the text in PDF, no.. really??? ok scroll down little bit… not yet… ok scroll more… no luck yet… ok Jump to the end of the page, Voila… there it is. Wait what just happened, why the text is almost at the end of the page? Well, its time to go through the code we just copied.

To create new PDF document, PDFBox provides a class – org.apache.pdfbox.pdmodel.PDDocument. You will find convenient methods like Saving PDF, adding signatures, adding new pages etc. To create new document, just initialize it with empty constructor.

PDDocument helloPdf = new PDDocument();

To add contents (text, image, form) in PDF, its necessary to create a page.

PDPage page = new PDPage(PDRectangle.A4);
helloPdf.addPage(page);

Did you notice? the constructor is initialized with PDRectangle.A4. That specifies the size of the page to be generated, you will find various options including LETTER, LEGAL, A0, A1 etc. Change the code and check the output. Once the page instance is initialized with appropriate properties, add it to the document.

Now, it is time to add some content to page, refer below code

PDPageContentStream contentStream = new PDPageContentStream(helloPdf, page);
contentStream.beginText();
contentStream.setFont(PDType1Font.HELVETICA_BOLD, 10);
contentStream.newLineAtOffset(10, 100);
contentStream.showText("Hello PDF World!!!");
contentStream.endText();

To add any content, an instance of PDPageContentStream is required. PDPageContentStream is initialized with two objects, PDF document and PDF Page. To add a text, call beginText method followed by setting Font. PDFBox comes with few default fonts, which are universally available on most of the systems. You can very well, add your own fonts too. For the time being, we are using Helvetica with bold. Go ahead and check other options available in PDType1Font.

The statement contentStream.newLineAtOffset(10, 100); instructs PDFBox, where the text has to be positioned. The first argument is x offset while the second one is y offset. If you wish to see, how it impacts the output, go ahead and comment the line. Later, play with the y offset and you will realize that y offset works with bottom of the page instead of top.

To add text, use contentStream.showText(“Hello PDF World!!!”); and the text will be added at the specified position. Once you are done with adding text, call following statement to close the text addition activity.

contentStream.endText();

And finally, close the stream, save the file and close the PDF instance.

contentStream.close();
helloPdf.save(new File("D:\\temp\\simple.pdf"));
helloPdf.close();

This was very simple example of PDF API offered by Apache PDFBox, you will find more examples at my repository.

Be Sociable, Share!

Leave a Reply

Your email address will not be published. Required fields are marked *