GitHub - tshurlock/fileToPDFConverter: A file converter to convert different file types to pdf

File to PDF converter

11/03/2024

Purpose

This programme takes a zipfile, extracts the contents. It then iterates through all files in dir and any subdir.

-If the file is a supported type it is converted to a PDF (in situ) and the original file format is deleted. -If the file is already a PDF file it is left in situ -If the file is an unsuported format it is left in situ

The end result is that all of the files in the location chosen to extract the zip file will end up as PDFs (unless unsupported format)

Operation

run main.py
User is provided with file explorer window in order to select a zipfile
User is immediately provided with a second file explorer window to select a location to extract zip file
A first pass OS walk checks for .doc format files and converts to .docx (in preparation for PDF conversion)
A second OS walk through the location where zip was extracted checks each file type.
if file type is a supported image, txt, msg or word file it is converted to PDF (in situ) and original file format is deleted.
If file type is already PDF or is an unsupported format then it is left in situ
A report 'conversion_to_PDF_report.txt' is produced detailing, date/time and the action taken for each file

WARNING

zipfile should be extracted to a new/empty directory. Main will attempt to convert all files at location of extraction, therefore if extracted to e.g. desktop, documents can result in long run time and uninteded loss of documents in their original format. To do: add try statement to confirm state of extraction destination directory to avoid this issue

main now uses file explorer to select zip file and select destination file and holds they values, next step is to run OS walk in main which will point the file the relevent converter so single walk is needed, not one for each py file

Action of individual modules

Currently able to convert .txt, .doc, .docx, image files and .msg.

-main; imports extractor.py which contains extraction function to extract a zipfile, and imports doToDocxConverter, which contains docToDocx function that converts doc to docx and deletes the original (in preparation for docxtopdf). In future main will import all the other file type converters so that a single iteration through directory will id filetype and then apply relevant converter.

-extractor; see above

-docToDocxConverter; see above

-imageConverter; converts ('.tif', '.tiff', '.jpg', '.jpeg', '.png', '.gif', '.eps', '.ai', '.psd', '.indd', '.raw') files to pdf

-txtAndMsgToPdfConverter; converts .msg or .txt to pdf, currently hard coded to single file

-wordToPDF; .docx to PDF

Supported File Types

Supported: .txt, .doc, .docx, .tif, .msg, .jpg, .png, .gif, .jpeg, .tiff

Not applicable: .pdf (conversion not required, PDF is desired file format) .zip (content will be extracted and individual files converted)

Currenlty not supported: .xlsx, .rtf, .xls, .htm, .wav, .xps, .xlsm, .html, .pptx, .ppt, .mht, .xltx, .csv, .bmp, .eml, .dotx

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.idea		.idea
GUI.py		GUI.py
GUIFolder.py		GUIFolder.py
README.md		README.md
convert.ico		convert.ico
converts.ico		converts.ico
datee.py		datee.py
docToDocxConverter.py		docToDocxConverter.py
excelToPDF.py		excelToPDF.py
extractor.py		extractor.py
imageConverter.py		imageConverter.py
main.py		main.py
mainIterationDraft.py		mainIterationDraft.py
osWalk.py		osWalk.py
stringPrinting.py		stringPrinting.py
testTxtFileWrite.py		testTxtFileWrite.py
txtAndMsgToPdfConverter.py		txtAndMsgToPdfConverter.py
wordToPDF.py		wordToPDF.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

tshurlock/fileToPDFConverter

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages