Descrambling PDF to Text conversion (long)

Just for future reference , is there a general principle concerning pdf to text that can be applied. Not relevant now as got the thing repaired but in the process I was looking for the pinout of Sony CXD 3058AR and only found this reference. From the example I had in front of me Vss was pins 17,31,37,46,78,79 , easy to establish

112 pins in total

I'm assuming a pinout of the IC appeared buried in this gibberish , presumably something else mixed in there as was horizontal to the CXD one (part only)

formatting link

. IC Block Diagrams - BD Board - IC101 CXD3058AR IOVSS0 XTACN WDCK SYSM WFCK LMUT

85

SCOR

EXCK SBSO SQCK SQSO

COUT

SENS

CLOK XLAT

DATA

XPCK

XUGF

XRST

C2PO

ATCK SCLK

112 111 110 109

108 107 106 105 104 103 102 101

100 99

98 97 96 95

94 93

92 91 90 89

88

87

86

Approx. 400 mVp-p

150 Vp-p

3 Vp-p CPU INTERFACE

SERVO INTERFACE

13 µs 2 IC101 ws (FEI) (CD Play mode) qs Q344 (Collector) (REC mode)

30.5 µs ws IC801 qh (CF2)

XTSL

C4M

VDD

VDD

GFS

VSS

84 RMUT 83 IOVDD0

82 AVDD2 D/A COVERTER LPF 81 AOUT2 80 VREFR 79 AVSS2 78 AVSS1 LPF 77 VREFL

76 AOUT1 75 AVDD1 74 XVDD CLOCK GENERATOR PWM GENERATOR DIGITAL CLV SERVO AUTO SEQUENCER 73 XTAI 72 XTAO 71 XVSS

Approx. 200 mVp-p

15 Vp-p

2.9 Vp-p

MIRR 1 DFCT 2 FOK 3 VSS 4 LOCK 5 MDP 6 SSTP 7 IOVSS1 8

MIRR DFCT FOK

13 µs 3 IC101 ek (RFACO) (CD Play mode)

100 ns

SFDR SRDR TFDR TRDR FFDR FRDR

9 10 11 12 13 14

70 IOVSS2 69 TES1 68 TEST DIGITAL OUT 67 DOUT 66 IOVDD2

IOVDD1 15 AVDD0 16 AVSS0 17 SERVO DSP

0.6 Vp-p A/D CONVERTER

65 EMPHI 64 EMPH

E F TEI TEO

18 19 20 21

63 VDD TE D/A INTERFACE SELECTOR ERROR CORRECTOR VC FE 32k RAM 62 BCK 61 PCMD 60 VSS 59 LRCK

4 IC101 us (XTAO) (CD Play mode)

FEI 22 FEO 23 VC 24

EFM DEMODULATOR

3.4 Vp-p

A B C D

25 26 27 28

SUM

SUB CODE PROCESSOR

59 ns

ASYNMMETRY CORRECTOR ATT EQ AMP

DIGITAL PLL DC/DC CONVERTER 58 LRCKI 57 PCMDI

APC

29 30 31 32

33 34

35

36 37 38

39 40 41 42 43 44 45 46 47 48 49 50

51

52 53 54 55

56

RFDCO

RFACO

DDVROUT DDVRSEN

EQ_IN

PDSENS

AC_SUM

AVDD4

AVSS4

RFACI

AVDD3

BIAS ASYI ASYO

VPCO VCTL

AVSS3

CLTV FILO FILI PCO

RFC

LD

AVDD5

AVSS5

DDCR

PD

28

28

BCKI

28 PREVCC
Reply to
N_Cook
Loading thread data ...

o

ng

...snip....

not sure but how about using a free pdf to doc/txt tool?

formatting link

Reply to
Robert Macy

looking

...snip....

not sure but how about using a free pdf to doc/txt tool?

formatting link

missing the point. Where there is only the text version out there, derived from a not available pdf.

Trying to reverse engineer one I did pdf to text myself.

80 pin chip data on pdf with 1 to 24 pins L to R along bottom and vertical text text read from right , then anticlockwise. The horizontal text appearing in the horizontal on left and right edges appears as though scanned from right to left, ie swapped over. Also grey tone blocking of Vcc and Gnd maked them get lumped together when appearing in the vertical script
Reply to
N_Cook

reference , is there a general principle concerning pdf to

ne

d
l
n

Did you try converting the text file to .DOC first? I have done that a few times with some success.

Reply to
hrhofmann
112 pins in total

I'm assuming a pinout of the IC appeared buried in this gibberish , presumably something else mixed in there as was horizontal to the CXD one (part only)

formatting link

_________________

Why do you need to convert it? You can download and read the whole pdf file from that link. The IC pinouts are quite readable.

Colin @ CATronics

Reply to
Colin Horsley

looking

file

Not available to me without registering for more unsolicted junk , I had to grab the Google-cached version. I tried proxies etc but no admittance

Just in case anyone has cottoned on to the intention of the thread. The following is for an 80 pin device originally on good pdf graphic as

24/16/24/16 pins. Starting pin 1 lower left corner and anticlockwise. Horizontal pinning with text vertical and Gnd text white in black block and Vcc in grey blocks. This is as a straight listing 1 DISCON# 2 VCC 3 GND 4 CLK24 5 GND 6 GND 7 A0 8 A1 9 A2 10 A3 11 A4 12 A5 13 GND 14 GND 15 A6 16 A7 17 GND 18 AGND 19 XIN 20 XOUT 21 AVCC 22 VCC 23 GND 24 EA 25 RESET 26 A8 27 A9 28 A10 29 A11 30 PC0/RxD0 31 PC1/TxD0 32 PC2/INT0# 33 PC3/INT1# 34 A12 35 A13 36 A14 37 A15 38 PC4/T0 39 PC5/T1 40 PC6/WR# 41 PB7/T2out 42 VCC 43 GND 44 PB0/T2 45 PB1/T2EX 46 PB2/RxD1 47 PB3/TxD1 48 D0 49 D1 50 D2 51 D3 52 PB4/INT4 53 PB5/INT5# 54 PB6/INT6 55 PC7/RD# 56 GND 57 D4 58 D5 59 D6 60 D7 61 BKPT 62 VCC 63 GND 64 SDA 65 SCL 66 WAKEUP# 67 NC 68 PA0/T0out 69 PA1/T1out 70 PA2/OE# 71 PA3/CS# 72 GND 73 PA4/FWR# 74 PA5/FRD# 75 PA6/RXD0out 76 PA7/RXD1out 77 USBD- 78 GND 79 USBD+ 80 PSEN#

and this the mangled version via Foxit text capture

80 PQFP 14x20mm label in middle

SDA BKPT PB2/RxD1 PB1/T2EX PC7/RD# GND VCC D7 D6 D5 D4 GND PB7/T2out PB6/INT6 PB5/INT5# PB4/INT4 D3 D2 D1 D0 PB3/TxD1 PB0/T2 GND VCC

60 59 58 57 56 55 54 53 52 51 50 49 48 46 45 44 43 42 4164 63 62 61 47 PC6/WR#SCL 4065 PC5/T1WAKEUP# 66 39 PC4/T0 NC 3867 PA0/T0out A15 3768 PA1/T1out A14 3669 PA2/OE# A13 3570 PA3/CS# A12 3471 80 PQFP GND PC3/INT1# 3372 PA4/FWR# PC2/INT0# 3273 14x20mm PA5/FRD# PC1/TxD0 3174 PA6/RXD0out PC0/RxD0 3075 PA7/RXD1out A11 2976 A10 USBD- 2877 GND A9 2778 USBD+ A8 2679 PSEN# RESET 2580 123456789 11 15 23 10 12 13 14 16 17 19 20 21 22 24 18 T A6 A7A0 A1 A2 A3 A4 A5 EA XIN VCCVCC GND GND GNDGND GND GND GND AVCC XOU AGND CLK24 DISCON#

I could not find a correlation between the mangled order and word length or word start or end position

Reply to
N_Cook

of

to

It shows its probably a hiding to nothing. The same presumably applies to highlighting just a single block of "vertical " text. It will not copy across unscrambled. Both Foxit and Acrobat mangle that pdf-text to text conversion, maybe somewhere there is a pdf reader +highlighting/ pdf to txt app ,that does not mangle.

The pinouts 1 to 24 and 41 to 64 in the above table of 1 to 80 listing, I had to descramble manually

Reply to
N_Cook

Not that it will help, but the (non)order is likely the order in which is was composed, using individual blocks of text. And it reflects the order of those layers within the PDF. The PDF contains positioning information for vector objects (like blocks of text) which is stripped out when converted to text only. Scott Dunedin, FL

Reply to
Scott

Bugmenot usually has a couple of active Scribd IDs .......

I *WONT* run Flash here, but I can assure you PDFs can be downloaded from Scribd (if you have a valid ID & password) without it. Obviously the preview doesn't work! ;-)

--
Ian Malcolm.   London, ENGLAND.  (NEWSGROUP REPLY PREFERRED)
ianm[at]the[dash]malcolms[dot]freeserve[dot]co[dot]uk
[at]=@, [dash]=- & [dot]=. *Warning* HTML & >32K emails --> NUL:
Reply to
IanM

If the text is stored as ASCII you can use any text editor like emacs to find the relevant text. I used to do this with WORD files in UNIX as well until they started using RAM techniques, splattering the text all over the place. THere is also a program on UNIX called antiword. And there is a UNIX filtering technique which I forgot where you remove any binary longer than two characters.

- = - Vasos Panagiotopoulos, Columbia'81+, Reagan, Mozart, Pindus, BioStrategist

formatting link
---{Nothing herein constitutes advice. Everything fully disclaimed.}--- [Homeland Security means private firearms not lazy obstructive guards] [Urb sprawl confounds terror] [Phooey on GUI: Windows for subprime Bimbos]

Reply to
vjp2.at

strings -n2 x.pdf

Is the command to debinary a pdf file in unix

- = - Vasos Panagiotopoulos, Columbia'81+, Reagan, Mozart, Pindus, BioStrategist

formatting link
---{Nothing herein constitutes advice. Everything fully disclaimed.}--- [Homeland Security means private firearms not lazy obstructive guards] [Urb sprawl confounds terror] [Phooey on GUI: Windows for subprime Bimbos]

Reply to
vjp2.at

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.