Open some.pdf
replace "Replace this!" with "Replaced"
Save some_edited.pdf
I get corrupted files in every encoding method I try. Is there something simple I am missing?
I can do this in notepad and it works just fine.I believe I am supposed to be using a binaryreader and a binarywriter now that I have been messing with it, but I am having a hard time replacing text inside the binary so far. I can get a char() collection back from the reader and it basically ends up with each individual character in a collection. There is also a function to get a string back, but I am not sure how to put it back in the writer correctly.
Anyone have any experience working with files like PDF?
You will probably need a 3rd party assembly or utility to edit a PDF file, it isn't as easy as editing the binaries because PDF uses GZIP, so isn't like editing it straight.
That is what I figured before trying to do this because I knew PDF files could have compression. What I found though was that these files have the string I want to replace in plain text. If I open it in notepad, make my edit and save the file it works great.
I just need to replicate that seemingly simple process in code without breaking the formatting of the document or the encoding or anything else.
In that case use a StreamWriter and a StreamReader.
Here is how I am trying to write the files. This makes a file that Acrobat will not open. In notepad I notice that some of the odd characters in the beginning are stripped out. As far as I can tell it must be happening on FileStream.ReadToEnd
Public Function ByteTest()Dim PDFFileAs String Dim PDFFolderAs IO.Directory Response.Write("Start Byte:" & DateTime.Now.ToLongTimeString &":" & Now.Millisecond &"<br>")For Each PDFFileIn PDFFolder.GetFiles(Server.MapPath("PDF"))'Open the fileDim FileStreamAs IO.StreamReader FileStream = IO.File.OpenText(PDFFile)'Load the file in to a stringDim ContentsAs String = FileStream.ReadToEnd'Replace text in string Contents = Contents.Replace("ABC1234567890","ABC1111111111")'Close stream FileStream.Close()'Create byte based output fileDim OutputFileNameAs String = Server.MapPath("PDFOutput\" & DateTime.Now.ToFileTimeUtc.ToString &"BYTE.pdf")Dim fsAs FileStream = File.Create(OutputFileName) fs.Close()'Convert the string to bytesDim infoAs Byte() =New System.Text.UTF8Encoding(True).GetBytes(Contents)'Write string as bytes to output file fs = File.OpenWrite(OutputFileName) fs.Write(info, 0, info.Length) fs.Close()Next Response.Write("Stop Byte:" & DateTime.Now.ToLongTimeString &":" & Now.Millisecond &"<br>")End FunctionI also wrote a test not using the bytes and trying several encoders. All of them will not open in Acrobat.
Public Function StringTest()
Dim PDFFileAs String
Dim PDFFolderAs IO.DirectoryResponse.Write("Start String:" & DateTime.Now.ToLongTimeString &":" & Now.Millisecond &"<br>")
For Each PDFFileIn PDFFolder.GetFiles(Server.MapPath("PDF"))
'Open the fileDim FileStreamAs IO.StreamReader
FileStream = IO.File.OpenText(PDFFile)'Load the file in to a stringDim ContentsAs String = FileStream.ReadToEnd
'Replace text in string Contents = Contents.Replace("ABC1234567890","ABC1111111111")
'Close stream FileStream.Close()'Create ASCII output fileDim OutputFileNameAs String = Server.MapPath("PDFOutput\" & DateTime.Now.ToFileTimeUtc.ToString &"STRING-ASCII.pdf")
Dim fsAs FileStream = File.Create(OutputFileName)
Dim PDFStreamAs StreamWriter =New StreamWriter(fs, System.Text.Encoding.ASCII)
PDFStream.Write(Contents)
PDFStream.Close()
fs.Close()'Create BigEndianUnicode output file OutputFileName = Server.MapPath("PDFOutput\" & DateTime.Now.ToFileTimeUtc.ToString &"STRING-BigEndianUnicode.pdf")
fs = File.Create(OutputFileName)
PDFStream =New StreamWriter(fs, System.Text.Encoding.BigEndianUnicode)
PDFStream.Write(Contents)
PDFStream.Close()
fs.Close()'Create default formatted output file OutputFileName = Server.MapPath("PDFOutput\" & DateTime.Now.ToFileTimeUtc.ToString &"STRING-Default.pdf")
fs = File.Create(OutputFileName)
PDFStream =New StreamWriter(fs, System.Text.Encoding.Default)
PDFStream.Write(Contents)
PDFStream.Close()
fs.Close()'Create Unicode output file OutputFileName = Server.MapPath("PDFOutput\" & DateTime.Now.ToFileTimeUtc.ToString &"STRING-Unicode.pdf")
fs = File.Create(OutputFileName)
PDFStream =New StreamWriter(fs, System.Text.Encoding.Unicode)
PDFStream.Write(Contents)
PDFStream.Close()
fs.Close()'Create UTF7 output file OutputFileName = Server.MapPath("PDFOutput\" & DateTime.Now.ToFileTimeUtc.ToString &"STRING-UTF7.pdf")
fs = File.Create(OutputFileName)
PDFStream =New StreamWriter(fs, System.Text.Encoding.UTF7)
PDFStream.Write(Contents)
PDFStream.Close()
fs.Close()'Create UTF8 output file OutputFileName = Server.MapPath("PDFOutput\" & DateTime.Now.ToFileTimeUtc.ToString &"STRING-UTF8.pdf")
fs = File.Create(OutputFileName)
PDFStream =New StreamWriter(fs, System.Text.Encoding.UTF8)
PDFStream.Write(Contents)
PDFStream.Close()
fs.Close()Next Response.Write("Stop String:" & DateTime.Now.ToLongTimeString &":" & Now.Millisecond &"<br>")
End Function
I found my answer on a newsgroup posting I made. This code generates working PDF files for me. Thanks to everyone for taking a look at the problem.
Sub ANSITest()
Dim PDFFileAs String
Dim PDFFolderAs IO.Directory
Dim EncodingAs System.Text.Encoding = Encoding.GetEncoding(1252)For Each PDFFileIn PDFFolder.GetFiles(Server.MapPath("PDF"))
'Open the fileDim FileStreamAs New IO.StreamReader(PDFFile, Encoding)'Load the file in to a stringDim ContentsAs String = FileStream.ReadToEnd
'Replace text in string Contents = Contents.Replace("ABC1234567890","ABC1111111111")
'Close stream FileStream.Close()'Write string as bytes to output fileDim OutputFileNameAs String = Server.MapPath("PDFOutput\" & DateTime.Now.ToFileTimeUtc.ToString &"ANSI.pdf")
Dim swAs New IO.StreamWriter(OutputFileName,False, Encoding)
sw.Write(Contents)
sw.Close()Next
End Sub
0 comments:
Post a Comment