Skip to content

michaelknigge/afpbox

Repository files navigation

afpbox Build Status codecov.io Coverity Status Download

Java library for parsing AFP (MO:DCA) printer data streams.

NOTE: This project is still work in progress...

Dependencies

afpbox has no runtime dependencies on other libraries. This was a design decision and will (hopefully) never change.

Usage

Because afpbox is available at jcenter it is very easy to use afpbox in your projects. At first, add afpbox to your build file. If you use Maven, add the following to your build file:

<dependency>
  <groupId>de.textmode.afpbox</groupId>
  <artifactId>afpbox</artifactId>
  <version>0.4</version>
  <type>pom</type>
</dependency>

If you use Gradle, add this:

dependencies {
    compile 'de.textmode.afpbox:afpbox:0.4'
}

AFP-Parser

AFP (MODCA) is a record oriented data stream. For this reason you need to implement a RecordReader first. afpbox comes with two common implementations of a RecordReader.

The StandardRecordReader is probably the RecordReader of your choice. It reads the special control character X'5A' of the record and determines the record length from the following two bytes.

The MvsRecordReader expects that every AFP record is prefixed with four bytes. The record length is determined from the first two bytes. The following two bytes are ignored. This record format corresponds to the record format VB on z/OS (formerly known as OS/390, which was formerly known as MVS).

Now when you have a RecordReader you further need a RecordHandler. The main idea behind the RecordHandler is that the application can control which structured fields have to be parsed and which not. You have to implement a RecordHandler according to your needs.

When you have a RecordReader and a RecordHandler you are ready to create a AfpParser. Let's build a sample application that will count the pages of an AFP file so you'll get the idea behind the design of afpbox:

int pageCounter = 0;
final InputStream is = new FileInputStream("myfile.afp");
final RecordHandler rh = new RecordHandler() {

    @Override
    public void handleLineRecord(final Record record) {
        // We just ignore line records (we don't support mixed-mode files in this sample).
    }

    @Override
    public boolean handleStructuredFieldIntroducer(final StructuredFieldIntroducer sfi) {
        // *ONLY* if the read record is a "Begin Page" (BPG) structured field: parse
        // the structured field and pass the passed structured field to method
        // "handleStructuredField" of the RecordHandler.
        return sfi.getStructuredFieldIdentifier() == StructuredFieldIdentifier.BPG;
    }

    @Override
    public void handleStructuredField(final StructuredField sf) {
        // We only get invoked on structured field "Begin Page" (BPG) - see above...
        ++pageCounter;
    }

    @Override
    public void handleFaultyStructuredField(final FaultyStructuredField sf) {
        // Hopefully we don't see faulty structured fields in our file...
    }
};

new AfpParser(new StandardRecordReader(is), rh).parse();

System.out.println("Pages in this file: " + pageCounter);

PTOCA-Parser

If you want to parse PTOCA control sequences, you have to combine the PTOCA data of all PTX structured fields (within a Presentation Text Block) and parse this combined data.

afpbox provides a PtocaParser for this PTOCA data. To use this PtocaParser you need to implement a PtocaControlSequenceHandler according to your needs. The idea of the design is somehow the same as for the RecordHandler above - the application decides which control sequences are parsed and which not.

Here is an example how to use the PtocaParser. This sample removes all NOPs from the PtocaControlSequence block and constructs a new PTOCA block. The sample is rather dumb and incomplete but it shows the idea behind the PtocaControlSequenceHandler and how to use it.

final ByteArrayOutputStream baos = new ByteArrayOutputStream();

PtocaParser.parse(ptocaBlock, new PtocaControlSequenceHandler() {

    @Override
    public boolean handleControSequence(final int functionType, final byte[] data, final int off) {
        // *ONLY* if the PTOCA function type is not "No Operation" (NOP - no matter if chained or unchained) parse
        // the PTOCA control sequence and invoke "handleControSequence" of the PtocaControlSequenceHandler.
        return functionType != PtocaControlSequenceFunctionType.NOP_UNCHAINED && functionType != PtocaControlSequenceFunctionType.NOP_CHAINED;
    }

    @Override
    public void handleControSequence(final PtocaControlSequence controlSequence) {
       baos.write(controlSequence.getData());
    }

    @Override
    public void handleCodePoints(final byte[] codePoints, final int off, final int len) {
        baos.write(controlSequence.getData(), off, len);
    }
});

Structured Fields

The following table shows which Structured Fields are currently supported ("supported" means that afpbox can parse the Stuctured Field and create a specific Java object for it).

Acronym Identifier Structured Field Name Supported
BAG X'D3A8C9' Begin Active Environment Group
BBC X'D3A8EB' Begin Bar Code Object
BDA X'D3EEEB' Bar Code Data
BDD X'D3A6EB' Bar Code Data Descriptor
BDG X'D3A8C4' Begin Document Environment Group
BDI X'D3A8A7' Begin Document Index
BDT X'D3A8A8' Begin Document
BFG X'D3A8C5' Begin Form Environment Group
BFM X'D3A8CD' Begin Form Map
BGR X'D3A8BB' Begin Graphics Object
BII X'D3A87B' Begin IM Image
BIM X'D3A8FB' Begin Image Object
BMM X'D3A8CC' Begin Medium Map
BMO X'D3A8DF' Begin Overlay
BNG X'D3A8AD' Begin Named Page Group
BOC X'D3A892' Begin Object Container
BOG X'D3A8C7' Begin Object Environment Group
BPF X'D3A8A5' Begin Print File
BPG X'D3A8AF' Begin Page
BPS X'D3A85F' Begin Page Segment
BPT X'D3A89B' Begin Presentation Text Object
BRG X'D3A8C6' Begin Resource Group
BRS X'D3A8CE' Begin Resource
BSG X'D3A8D9' Begin Resource Environment Group
CDD X'D3A692' Container Data Descriptor
CTC X'D3A79B' Composed Text Control
EAG X'D3A9C9' End Active Environment Group
EBC X'D3A9EB' End Bar Code Object
EDG X'D3A9C4' End Document Environment Group
EDI X'D3A9A7' End Document Index
EDT X'D3A9A8' End Document
EFG X'D3A9C5' End Form Environment Group
EFM X'D3A9CD' End Form Map
EGR X'D3A9BB' End Graphics Object
EII X'D3A97B' End IM Image
EIM X'D3A9FB' End Image Object
EMM X'D3A9CC' End Medium Map
EMO X'D3A9DF' End Overlay
ENG X'D3A9AD' End Named Page Group
EOC X'D3A992' End Object Container
EOG X'D3A9C7' End Object Environment Group
EPF X'D3A9A5' End Print File
EPG X'D3A9AF' End Page
EPS X'D3A95F' End Page Segment
EPT X'D3A99B' End Presentation Text Object
ERG X'D3A9C6' End Resource Group
ERS X'D3A9CE' End Resource
ESG X'D3A9D9' End Resource Environment Group
FGD X'D3A6C5' Form Environment Group Descriptor
GAD X'D3EEBB' Graphics Data
GDD X'D3A6BB' Graphics Data Descriptor
ICP X'D3AC7B' IM Image Cell Position
IDD X'D3A6FB' Image Data Descriptor
IEL X'D3B2A7' Index Element
IID X'D3A67B' Image Input Descriptor
IMM X'D3ABCC' Invoke Medium Map
IOB X'D3AFC3' Include Object
IOC X'D3A77B' IM Image Output Control
IPD X'D3EEFB' Image Picture Data
IPG X'D3AFAF' Include Page
IPO X'D3AFD8' Include Page Overlay
IPS X'D3AF5F' Include Page Segment
IRD X'D3EE7B' IM Image Raster Data
LLE X'D3B490' Link Logical Element
MBC X'D3ABEB' Map Bar Code Object
MCC X'D3A288' Medium Copy Count
MCD X'D3AB92' Map Container Data
MCF X'D3AB8A' Map Coded Font
MCF-1 X'D3B18A' Map Coded Font Format-1
MDD X'D3A688' Medium Descriptor
MDR X'D3ABC3' Map Data Resource
MFC X'D3A088' Medium Finishing Control
MGO X'D3ABBB' Map Graphics Object
MIO X'D3ABFB' Map Image Object
MMC X'D3A788' Medium Modification Control
MMD X'D3ABCD' Map Media Destination
MMO X'D3B1DF' Map Medium Overlay
MMT X'D3AB88' Map Media Type
MPG X'D3ABAF' Map Page
MPO X'D3ABD8' Map Page Overlay
MPS X'D3B15F' Map Page Segment
MPT X'D3AB9B' Map Presentation Text
MSU X'D3ABEA' Map Suppression
NOP X'D3EEEE' No Operation
OBD X'D3A66B' Object Area Descriptor
OBP X'D3AC6B' Object Area Position
OCD X'D3EE92' Object Container Data
PEC X'D3A7A8' Presentation Environment Control
PFC X'D3B288' Presentation Fidelity Control
PGD X'D3A6AF' Page Descriptor
PGP X'D3B1AF' Page Position
PGP-1 X'D3ACAF' Page Position Format-1
PMC X'D3A7AF' Page Modification Control
PPO X'D3ADC3' Preprocess Presentation Object
PTD X'D3B19B' Presentation Text Data Descriptor
PTD-1 X'D3A69B' Presentation Text Descriptor Format-1
PTX X'D3EE9B' Presentation Text Data
TLE X'D3A090' Tag Logical Element

Triplets

The following table shows the Triplets and the current status of the corresponding support of the Triplet ("supported" means that afpbox can parse the Triplet and create a specific Java object for it).

ID Name Supported
X'4D' Area Definition
X'80' Attribute Qualifier
X'36' Attribute Value
X'26' Character Rotation
X'96' CMR Tag Fidelity
X'01' Coded Graphic Character Set Global ID
X'75' Color Fidelity
X'91' Color Management Resource Descriptor
X'4E' Color Specification
X'65' Comment
X'8B' Data-Object Font Descriptor
X'43' Descriptor Position
X'97' Device Appearance
X'50' Encoding Scheme ID
X'22' Extended Resource Local ID
X'88' Finishing Fidelity
X'85' Finishing Operation
X'20' Font Coded Graphic Character Set Global Identifier
X'1F' Font Descriptor Specification
X'78' Font Fidelity
X'5D' Font Horizontal Scale Factor
X'84' Font Resolution and Metric Technology
X'02' Fully Qualified Name
X'9A' Image Resolution
X'73' IMM Insertion (Retired)
X'9D' Keep Group Together
X'27' Line Data Object Position Migration (Retired)
X'62' Local Date and Time Stamp
X'8C' Locale Selector
X'04' Mapping Option
X'45' Media Eject Control
X'87' Media Fidelity
X'56' Medium Map Page Number
X'68' Medium Orientation
X'8F' MO:DCA Function Set
X'18' MO:DCA Interchange Set
X'4B' Object Area Measurement Units
X'4C' Object Area Size
X'57' Object Byte Extent
X'2D' Object Byte Offset
X'63' Object Checksum (Retired)
X'10' Object Classification
X'9C' Object Container Presentation Space Size
X'5E' Object Count
X'21' Object Function Set Specification (Retired)
X'5A' Object Offset
X'64' Object Origin Identifier (Retired)
X'59' Object Structured Field Extent
X'58' Object Structured Field Offset
X'46' Page Overlay Conditional Processing (Retired)
X'81' Page Position Information
X'82' Parameter Value
X'83' Presentation Control
X'71' Presentation Space Mixing Rules
X'70' Presentation Space Reset Mixing
X'95' Rendering Intent
X'24' Resource Local ID
X'6C' Resource Object Include
X'21' Resource Object Type
X'25' Resource Section Number
X'47' Resource Usage Attribute (Retired)
X'86' Text Fidelity
X'1D' Text Orientation (Retired)
X'74' Toner Saver
X'FF' Triplet Extender
X'72' Universal Date and Time Stamp
X'8E' UP3i Finishing Operation

PTOCA Control Sequence

The following table shows which PTOCA Control Sequence are currently supported ("supported" means that afpbox can parse the PTOCA Control Sequence and create a specific Java object for it).

Acronym Control Sequence Name Supported
AMB Absolute Move Baseline
AMI Absolute Move Inline
BLN Begin Line
BSU Begin Suppression
DBR Draw B-axis Rule
DIR Draw I-axis Rule
ESU End Suppression
GAR Glyph Advance Run
GIR Glyph ID Run
GLC Glyph Layout Control
GOR Glyph Offset Run
NOP No Operation
OVS Overstrike
RMB Relative Move Baseline
RMI Relative Move Inline
RPS Repeat String
SBI Set Baseline Increment
SCFL Set Coded Font Local
SEC Set Extended Text Color
SIA Set Intercharacter Adjustment
SIM Set Inline Margin
STC Set Text Color
STO Set Text Orientation
SVI Set Variable Space Character Increment
TBM Temporary Baseline Move
TRN Transparent Data
UCT Unicode Complex Text
USC Underscore

GOCA Drawing Orders

The following table shows which GOCA Drawing Orders are currently supported ("supported" means that afpbox can parse the GOCA Drawing Order and create a specific Java object for it).

Acronym Identifier Structured Field Name Supported
GBAR X'68' Begin Area
GBCP X'DE' Begin Custom Pattern
GBIMG X'D1' Begin Image at Given Position
GBOX X'C0' Box at Given Position
GCBEZ X'E5' Cubic Bezier Curve at Given Position
GCBIMG X'91' Begin Image at Current Position
GCBOX X'80' Box at Current Position
GCCBEZ X'A5' Cubic Bezier Curve at Current Position
GCCHST X'83' Character String at Current Position
GCFARC X'87' Full Arc at Current Position
GCFLT X'85' Fillet at Current Position
GCHST X'C3' Character String at Given Position
GCLINE X'81' Line at Current Position
GCMRK X'82' Marker at Current Position
GCOMT X'01' Comment
GCPARC X'A3' Partial Arc at Current Position
GCRLINE X'A1' Relative Line at Current Position
GDPT X'DF' Delete Pattern
GEAR X'60' End Area
GECP X'5E' End Custom Pattern
GEIMG X'93' End Image
GEPROL X'3E' End Prolog
GFARC X'C7' Full Arc at Given Position
GFLT X'C5' Fillet at Given Position
GIMD X'92' Image Data
GLGD X'FEDC' Linear Gradient
GLINE X'C1' Line at Given Position
GMRK X'C2' Marker at Given Position
GNOP1 X'00' No-Operation
GPARC X'E3' Partial Arc at Given Position
GRGD X'FEDD' Radial Gradient
GRLINE X'E1' Relative Line at Given Position
GSAP X'22' Set Arc Parameters
GSBMX X'0D' Set Background Mix
GSCA X'34' Set Character Angle
GSCC X'33' Set Character Cell
GSCD X'3A' Set Character Direction
GSCH X'35' Set Character Shear
GSCLT X'20' Set Custom Line Type
GSCOL X'0A' Set Color
GSCP X'21' Set Current Position
GSCR X'39' Set Character Precision
GSCS X'38' Set Character Set
GSECOL X'26' Set Extended Color
GSFLW X'11' Set Fractional Line Width
GSGCH X'04' Segment Characteristics
GSLE X'1A' Set Line End
GSLJ X'1B' Set Line Join
GSLT X'18' Set Line Type
GSLW X'19' Set Line Width
GSMC X'37' Set Marker Cell
GSMP X'3B' Set Marker Precision (obsolete)
GSMS X'3C' Set Marker Set
GSMT X'29' Set Marker Symbol
GSMX X'0C' Set Mix
GSPCOL X'B2' Set Process Color
GSPIK X'43' Set Pick Identifier
GSPRP X'A0' Set Pattern Reference Point
GSPS X'08' Set Pattern Set
GSPT X'28' Set Pattern Symbol
???? X'71' End Segment

Contribute

If you want to contribute to afpbox, you're welcome. But please make sure that your changes keep the quality of afpbox at least at it's current level. So please make sure that your contributions comply with the afpbox coding conventions (formatting etc.) and that your contributions are validated by JUnit tests.

It is easy to check this - just build the source with gradle before creating a pull request. The gradle default tasks will run checkstyle, findbugs and build the JavaDoc. If everything goes well, you're welcome to create a pull request.

Hint: If you use Eclipse as your IDE, you can simply run gradle eclipse to create the Eclipse project files. Furthermore you can import Eclipse formatter settings (see file config/eclipse-formatter.xml) as well as Eclipse preferences (see file config/eclipse-preferences.epf) that will assist you in formatting the afpbox source code according the used coding conventions (no tabs, UTF-8 encoding, indent by 4 spaces, no line longer than 120 characters, etc.).

Manuals

The following reference materials were used to implement this parser:

All those documents are available at the web site of the AFP Consortium

About

Java library for parsing AFP (MO:DCA) data streams

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages