IWM reverse engineering

40 posts / 0 new

Last post

March 11, 2024 - 4:15pm

UncleBernie

Offline

Last seen: 3 days 14 hours ago

Joined: Apr 1 2020 - 16:46

Posts: 875

IWM reverse engineering

After probing the waters in this thread:

https://www.applefritter.com/content/how-important-are-iwm-features-apple-iic

I finally got enough motivation to proceed with reverse engineering the IWM myself. I'm well aware that other have done that long ago, as obvious in the above thread, but everything I found on the usual sites like github is Verilog or VHDL based, and I can't use that with my much older (early 1980s era) tools for the smaller PLDs I use. ispLEVER of course could do Verilog / VHDL but that tower PC is down and needs repair. And having an IWM in a ispLSI1016 would not help me much, as I only have a few of them. And no, I never again buy any ICs from Chinese vendors. Most are fake and a total waste of time and money. Smaller PLDs I have in huge numbers. And they never left the USA since they were made.

Here is the current state of the IWM reverse engineering work:

Over the weekend, I've finally built a hardware rig to exercise the real IWM, see here:

PDRM4588_mId.jpg

Illuminated yellow LED No. "0" means stepper motor phase 0 is on, and the green LED illuminated means that drive #1 is selected and its motor is running (of course, there is no floppy disk drive attached). You may notice that the hardware is based on the MMU test rig I used for this work:

https://www.applefritter.com/content/uncle-bernies-iou-and-mmu-substitutes-plds

... just that I pulled some TTLs, and added a DIL-28 socket for the IWM. What is left in terms of TTLs is a 74LS14 hex Schmitt Trigger and a 74LS374 octal DFF (hidden under the grey probe clip). The '374 produces all the slower input signals for the IWM, such like RESET, RDDATA, SENSE, A3, A2, A1, A0. The faster signals like the FCLK and Q3 clocks and the /DEV device enable signal are generated directly from the control port of the LPT printer port, and go through the 74LS14. Here is how FCLK and Q3 look on a scope:

PDRM4586_mid.jpg

The time base is 2us, so the clock period of 4us at FLCK is as fast as I can get with generating all the timing by software. What you can see on the scope is one CPU cycle. The lower trace is a marker signal for CPU cycle start. The CPU cycle ends with the last FCLK falling.

The software running on a notebook computer under DOS contains a routine which feeds any desired RDDATA bit stream to the IWM. By reading the shift register of the IWM (every CPU cycle) I can tell exactly how the data seperator works.

This primitive rig comprising only two TTLs and a DIL-28 socket for the IWM has all the capabilities needed to explore all modes of the IWM, which are: 7M or 8M clock, SLOW or FAST bit cells, synchronous or asynchronous mode (and PORT mode), and reading and writing of floppy disk data streams.

So from my original plan to reverse engineer the IWM in two weeks, 6 1/2 days already have been consumed (the work started in earnest on 5th of March, 2024). Alas, I had some other things to do in and around the house so it may have been only 3 days of work on the IWM reverse engineering, and the rest were household chores. But within this small amount of time, I have built the test rig and wrote the low level driver software for it. I'm already getting data out which looks almost as if it works as expected. But a deeper analysis of course involves much more programming work.

Once I have confirmed / corrected my state diagrams for the control state machines I suspect to be within the IWM, I can proceed to synthesize the logic for the IWM substitute and integrate the gate level logic equations into the software driving the rig. Then, the original IWM and my equations will run in lockstep and any difference will be automatically discovered and flagged.

This reverse engineering approach is a little bit different from the MMU approach, because I already have the RTL (and the logic equations) for the DISK II floppy disk controller. So what is left is to put in the changes for the extra functions within the IWM, and not starting from scratch. This knowledge base also is the reason why I only allocated 2 weeks for reverse engineering the IWM. It's basically the same as the DISK II controller but with two octal latches / pipeline registers added: one to hold the data byte to be written to floppy disk, and one to hold the data byte that was read from floppy disk. In between the two octal latches sits the same shift register as in the DISK II controller (function wise). The octal latches are in transparent mode when the IWM is in the synchronous (legacy) mode where it mimics the DISK II, with a small exception related to the "freeze" period where the shifting stops so the 6502 gets enough time to grab the data byte. These are the little added details I need to figure out.

I'm very curious about what I will find out about the data separator of the IWM vs. the DISK II. If it's not the same STG in legacy mode, then they (Apple) did shoot themselves in the foot (again). Because the original STG seen in the DISK II has some ugly warts which may or may not be exploited by copyprotection formats. In my exploration work of the DISK II I have discovered that these "ugly warts" can be removed from the STG and the resulting floppy disk controller is cleaner and works fine with all "official" Apple GCR, but I doubt it may work with all copyprotections out there, so I kept these "warts" in my PLD substitutes for the original DISK II controller. And I also intend to put the same STG into my own flavor of IWM substitute. And switch it to another STG in all the other modes (all non-DISK II modes).

Stay tuned !

- Uncle Bernie

March 11, 2024 - 4:34pm

retro_devices

Offline

Last seen: 1 day 2 hours ago

Joined: Jan 31 2024 - 06:40

Posts: 62

https://github.com/mamedev

https://github.com/mamedev/mame/blob/master/src/devices/machine/iwm.cpp

March 11, 2024 - 9:42pm

UncleBernie

Offline

Last seen: 3 days 14 hours ago

Joined: Apr 1 2020 - 16:46

Posts: 875

Of course I did look at the mame code, too ...

... but it's C++ and not any synthesizable RTL. It could be analyzed and then rewritten as synthesizable RTL, but this may take longer than starting from scratch, based on the real hardware. It's also about the (software) infrastructure needed around the iwm.cpp core to make it work. No way to run MAME on a 640 kByte RAM DOS machine.

What I want is a synthesizable IWM which is based on slight changes/additions to my existing RTL and test code base for the DISK II. The work of the MAME coders is too far away from the actual hardware to make that a viable path for me.

But maybe one of those MAME coders could answer the question about the compatibility of the IWM vs. the DISK II controller when it comes to copy protected Apple II floppy disks. I already have found enough differences that I suspect there may be an issue with at least some copy protections not working on the IWM (see my next post).

- Uncle Bernie

March 11, 2024 - 9:47pm

UncleBernie

Offline

Last seen: 3 days 14 hours ago

Joined: Apr 1 2020 - 16:46

Posts: 875

Down the IWM rabbit hole (or: Uncle Bernie in Wonderland).

Running the first disk nibble / data separator tests, I already found evidence that the IWM is a different animal then the DISK II controller. This may have profound and dire consequences for copyprotected Apple II disks.

The most striking difference is the behaviour of the MSB when reading from floppy disk. In the DISK II, the shift register ("SR") shifts in data to the left until it is "full", meaning the MSB is set. A program looking at the SR contents can see the MSB coming as it gets shifted through the SR. When the MSB is set, the shift register in the DISK II stops shifting while the data seperator waits for the next "1" bit coming from the floppy disk. This typically takes 4 us. Then it looks for the next 6 us (12 Q3 cycles) if another "1" comes. And only after that time it clears the SR and shift in "11" or "10", depending on what it saw. This temporary "stall" of the SR gives the 6502 just enough time to capture the finished disk byte which always has the MSB set.

The IWM does it differently. Once the leftmost bit (bit #6) should be shifted into the MSB (bit #7), it disappears for exactly one FCLK cycle. And then it reappears. And then the shift register stalls as in the DISK II.

Assuming we read in a bit sequence 11111001 (0xF9) from the floppy disk, the DISK II will produce:

0x7C 0xF9 0xF9 .... (stall)

while the IWM will produce:

0x7C 0x79 0xF9 .... (stall)

I can make the conjecture that this behaviour is intentional, because there is a "port mode" in the IWM in which /DEV (device select) is permanently asserted and the MSB (D7) is used as a clock signal whose rising edge will pump the data byte stream into some downstream data pipeline. The "disappearing" MSB generates a data setup time for the first register of this data path, which is exactly one FCLK cycle long. Otherwise the "port mode" would not work or it would need an external DFF to delay the MSB rising edge by the appropriate amount.

In Apple document 19840288 which can be found on the web, there is a warning: "... IWM will not work reliably in synchronous mode when using a NMOS 6502 ... but it will work with the WDC 65C02." The next paragraph of said document then goes on syaing that "... IWM may change DB7 just before the end of the read window of the CPU ..."

I think that this bug is rooted in the design decision to have the "disappearing" MSB to make "port mode" work without an external DFF. What a shame ! This weird behaviour should only be present in "port mode" which is defined as an asychronous mode (ASYNC = 1) and LATCH = 0, where ASYNC and LATCH are bits #1 and #0 of the IWM mode register that is so cunningly hidden from inadvertant access by legacy Apple II software.

I would think that in legacy modes (ASYNC = 0, LATCH = 0) which are supposed to be DISK II compatible the "disappearing" MSB should not happen. But it does.

IMHO Apple should have fixed that bug in a later revision of the IWM. So that the IWM would also work with NMOS 6502. But in the specimen I have (Apple part #344-0041-A) this evil MSB behaviour definitely is present also in synchronous mode.

There also seem to be some other, subtle timing differences how the IWM SR shifts, but so far my software is not yet in shape to probe into that.

Not sure if such differences, however small, could affect copyprotections and make the load of legacy Apple II copyprotected floppy disks fail. Maybe somebody more knowledgeable about this topic could make a comment. In other words: are there Apple II copy protections known which fail to load on an Apple IIc but work with an Apple IIe, and not being caused by differences in the firmware between the two machines ?

- Uncle Bernie

Here is a screen shot showing the disappearing MSB:

PDRM4589.JPG

March 12, 2024 - 12:55am

retro_devices

Offline

Last seen: 1 day 2 hours ago

Joined: Jan 31 2024 - 06:40

Posts: 62

All IMWs (about 5 pcs. incl.

All IMWs (about 5 pcs. incl. one burnt) I have around in several serviced devices are 344-0041-B . I wonder where did you take your -A version from? On my opinion -A variant is very rare and this most likely is a result of corrected bug(s) by Apple.

March 12, 2024 - 5:22pm

UncleBernie

Offline

Last seen: 3 days 14 hours ago

Joined: Apr 1 2020 - 16:46

Posts: 875

Oooops - two versions of the IWM exist and cause extra work !

In post #5, retro_devices wrote:

" All IMWs (about 5 pcs. incl. one burnt) I have around in several serviced devices are 344-0041-B . I wonder where did you take your -A version from? On my opinion -A variant is very rare and this most likely is a result of corrected bug(s) by Apple. "

Uncle Bernie answers:

My IWM (A version) came out of a very early Apple IIc. I was not aware that these are so rare. Of course, further down the road I also need a "B" version to see what the differences between "A" and "B" might be. But at the moment I'm fine with the "A" version, it is good enough to develop and debug my code. You see, part of the "secret sauce" I use for such reverse engineering tasks is to have a hardware rig in which the original resides, driven by software running under DOS (no Windows or Linux can do that ;-) and within the software there also lives the RTL / logic equations which, when the software is complete, run concurrently with the real IC. And any differences get flagged. This setup (once complete) allows a very thorough automatic compare of the RTL to the real hardware.

I ran about 50 copyprotected games (WOZ files on a BMOW floppy emu) and about 20 copyprotected real floppy disks (all originals !) on this Apple IIc before I took out the IWM it had. And as far as I can tell, none of them refused to work, except for the original of "Centipede" by Atarisoft. Which seems to be damaged despite I came out of an unopened shrink wrapped box. The WOZ file of the same game works.

So the "A" version of the IWM may not be too bad, at least in the legacy mode. The other modes may have bugs, I'm not there yet. But on the long run I will need to find a way to get a "B" version of the IWM and an early Mac in which this IWM was used. I could care less about my IWM substitute having working "Mac" modes never used in an Apple IIe or IIc (after all this is for my Replica 2e project) but as long as it's not excessive additional work, I prefer to do a complete job before I take the test rig apart again. Its destiny is to morph into a card that plugs into the Apple-1 and will have the final versions of all my Apple-1 add-ons (DRAM expansion, Apple II compatible color graphics, and the floppy disk controller). Normally I don't repurpose such cards but this one is rare and hard to find in good unused condition.

- Uncle Bernie

March 12, 2024 - 6:30pm

softwarejanitor

Offline

Last seen: 1 day 7 hours ago

Joined: Jul 5 2018 - 09:44

Posts: 2587

" You see, part of the

" You see, part of the "secret sauce" I use for such reverse engineering tasks is to have a hardware rig in which the original resides, driven by software running under DOS (no Windows or Linux can do that ;-)"

Actually you can do anything under Linux you could do with MS-DOS, you'd just need to do things a little differently. Something like an LKM (Loadable Kernel Module) would be one approach for doing low level hardware interfacing. Theoretically you could also write a Windows device driver, but Microsoft's closed source model basically makes that out of reach for anyone other than big corporations these days.

FWIW, part of the reason I know it can be done in Linux is all the test rigs that the place I am currently working at (Major fabless semiconductor company) run Linux. They only use Windows for end stage testing because they can't tailor and control it enough to do the early stage low level hardware testing on CPUs and GPUs.

March 13, 2024 - 3:34am

retro_devices

Offline

Last seen: 1 day 2 hours ago

Joined: Jan 31 2024 - 06:40

Posts: 62

softwarejanitor wrote:" You

softwarejanitor wrote:
" You see, part of the "secret sauce" I use for such reverse engineering tasks is to have a hardware rig in which the original resides, driven by software running under DOS (no Windows or Linux can do that ;-)"

Actually you can do anything under Linux you could do with MS-DOS, you'd just need to do things a little differently. Something like an LKM (Loadable Kernel

The pen is not important for the writer, it is the written that is important.

Linux is not a RTOS, which is not critical with the analysis of this static chip but there is no point one's task of bitbanging LPT to coexist with a super-puper complicated multitasking OS. DOS is morte than fine and has less execution jitter than linux. I would have used DOS too. No need to comly with and study the huge number of restrictions of a full-fledged multitasking OS, moreover to study its way of writing kernel-level drivers to achieve a few simple IO operations that in essence are equal to single CPU instructions.

March 13, 2024 - 4:08pm

UncleBernie

Offline

Last seen: 3 days 14 hours ago

Joined: Apr 1 2020 - 16:46

Posts: 875

Bit banging on Linux

In post #8, 'retro_devices' wrote:

" DOS is more than fine and has less execution jitter than linux ..."

Uncle Bernie comments:

Spot on. The execution jitter caused by premptive multitasking systems is really bad, and foils oscilloscope work, this I why I don't use Linux for this kind of reverse engineering work, despite I know how to bit-bang a LPT port under Linux, it does not even need a special kernel, just some system calls to gain access to the port (only possible if the program has root privileges). But there is no way to disable the interrupts in Linux (at least I did not find a way to do that).

DOS, with all its limitiations, is perfect for that kind of work. I use MS C7.00 and it even has a library function to enable/disable interrupts.

Forget Windows for any such low level hardware work. Not worth the effort / learning curve dealing with their closed system. It's mental poison.

Yet another topic is building hardware add-ons which require fast data transfer rates. I have a cute little development from the 1990s which was later named "Ratweasel" after I learned that a "Greaseweasel" exists. The pun is intended. Cross a rat with a weasel and you get something that is really lean and mean. The downside with this development is that it requires ECP / DMA mode for the parallel printer port from so-called "Super I/O" chipsets to make it work. These can only be found in older laptops. It's a pity that technical progress has made this development obsolete. It can read and write flux transistion streams with 50ns timing resolution (plenty of resolution even for 2us bit cells / MFM) and interfaces to Apple II disk drives (so all the half tracks and quarter tracks can be reached). It could read and write any format if the software to analyze and synthesize flux streams would be written or ported to it. I never had the time or motivation to do that. So I'll wait until the Applesauce guys make some new hardware that is cheap enough to buy. "Ratweasel" is only 3 cheap TTLs and 2 GAL16V8. This is what I think is "lean and mean" ;-)

This said, here is the problem: if you want to do anything like that with today's computers you need USB and then you need a complete 32 bit computer in that hardware to run the USB protocol stack. The "Greaseweasel" uses a cheap ST microcontroller but does not support Apple II floppy disk drive interfaces. What I fear that whatever next generation of Applesauce they might brew up, it might be too expensive for casual use. Over the past decades, there were heinously expensive flux engines around, huge PCBs with lots of ICs, and an FPGA in it, I wonder why they made it so expensive and used more than 3 TTL and 2 GALs, all costing less than $5. Seems that lack of brains can be compensated by throwing lots of expensive ICs at a mission. The "Greaseweasel" is definitely the smartest solution, only brain, almost no hardware needed. Hence, cheap.

But as it turned out, for the IWM reverse engineering work I don't need a flux engine. I have another ongoing project of lesser priority, and this is reverse engineering of the Western Digital FDCs, including those with analog PLLs. This can't be done without a flux engine. At the moment the rig is such that "Ratweasel" plugged into a first DOS notebook makes a flux stream which goes into a WDC177x plugged into a second DOS notebook. This allows me to explore the digital data seperator of the WDC177x family and once that is done, I can proceed to the FDCs with analog PLL based data seperators. Of course, the WDC177x exploration couldbe done with a similar rig as for the IWM. But when I make the clocks by software (at a much lower speed) then I can't use it to read a real floppy disk, which is part of the mission.

You can see that I am very interested in this floppy disk controller stuff. I hope I can complete all the work before all my floppy disks have deteriorated to a point where they are useless.

As for the IWM work, I'm currently exploring the read windows and already have found a lot of head scratching behavior patterns NOT compatible with what the DISK II controller does.

- Uncle Bernie

March 15, 2024 - 9:44pm

#10

S.Elliott

Offline

Last seen: 14 hours 55 min ago

Joined: Jun 23 2022 - 16:26

Posts: 207

A tangible example archived for posterity!

UncleBernie wrote:
I ran about 50 copyprotected games (WOZ files on a BMOW floppy emu) and about 20 copyprotected real floppy disks (all originals !) on this Apple IIc before I took out the IWM it had. And as far as I can tell, none of them refused to work, except for the original of "Centipede" by Atarisoft. Which seems to be damaged despite I came out of an unopened shrink wrapped box. The WOZ file of the same game works.

Even when the IWM reproduced copy-protection signatures correctly, sometimes the detection logic needed to be tweaked to get the desired result from the IWM.

When Jordan Mechner archived the source code for Prince of Persia, it included the copy-protection logic. In POPBOOT0.S at line 364 there's a comment from someone (Roland?) who was troubleshooting triple-E7 copy protection with an Apple //c. (Source code here.)

PoP IWM comment.png

Annotated source code from Prince of Persia's copy protection check

That comment on line 364 suggests that the copy-protection had to be modified to make its data-register-synchronization trick suitable for the Apple //c. And that shouldn't be a surprise because the Disk II had a single register for both data and WP-status, whereas the IWM has separate registers. On the Disk II, merely activating WP-sense (with a write-protected disk) would instantly reset the state machine and force the data register to be cleared. Apparently the IWM was ultimately capable of doing that, but it took a little extra tweaking to make it work.

For what it's worth, Bank Street Music Writer used that same triple-E7 copy protection scheme, but it booted unreliably on my roommate's ROM 255 Apple //c. I'll bet they used a detection routine that omitted the fix [hack] shown in the source code above, and thus their detection only worked intermittently with the IWM.

March 16, 2024 - 10:26am

#11

baldrick

Offline

Last seen: 2 days 14 hours ago

Joined: Apr 26 2016 - 08:36

Posts: 681

S.Elliott wrote:For what it's

S.Elliott wrote:
For what it's worth, Bank Street Music Writer used that same triple-E7 copy protection scheme, but it booted unreliably on my roommate's ROM 255 Apple //c. I'll bet they used a detection routine that omitted the fix [hack] shown in the source code above, and thus their detection only worked intermittently with the IWM.

I remember back in the 80s that the Apple IIc was "not 100% compatible" with the Apple IIe.

This may have had something to do with it.

March 16, 2024 - 1:03pm

#12

UncleBernie

Offline

Last seen: 3 days 14 hours ago

Joined: Apr 1 2020 - 16:46

Posts: 875

IWM incompatibility with some Apple II copyprotection schemes

In post #10, 'S.Elliott' wrote:

" For what it's worth, Bank Street Music Writer used that same triple-E7 copy protection scheme, but it booted unreliably on my roommate's ROM 255 Apple //c. I'll bet they used a detection routine that omitted the fix [hack] shown in the source code above, and thus their detection only worked intermittently with the IWM."

Uncle Bernie thanks you !

This insight from a past long ago is very valuable for my work on the IWM substitute. Because I now know that I better include a "DISK II" compatibility mode. Along with a fix for the unreliable operation of the IWM with a NMOS 6502, the reason of which is well documented by Apple. This is not a design bug, as I see it, it was put in intentionally to make the PORT MODE work reliably with good data setup/hold time. They (Apple) shot themselves in the foot with that noble intent because the same timing trick to make PORT MODE work without an extra external DFF (how much would that have cost them ?) screwed up the data setup/hold time for the NMOS 6502. The CMOS 6502 would 'regenerate' the crumbling data on the data bus internally, and so it worked with the IWM ... barely, I think.

During the two days of snow storm here in Colorado Springs I could not leave the house and so I could not do the required research on the internet to solve another riddle I found in the IWM. As I planned in post #1, reverse engineering the IWM should have been smooth sailing, no more than 2 weeks planned for it, because of two factors:

a) I have a complete RTL solution for the DISK II with clock cycle exact "C" language models and test benches

b) I had a printout of U.S.-Pat. 4,742,448 which describes the IWM in great detail, and had already coded the "stuff" seen in the patent as "C" models

So I thought it's gonna be a quick job.

But I soon discovered erratic behaviour of the IWM in the test rig, the same test run several times would fail some times, which was traced to the fact that the RESET pin of the IWM does NOT reset everything and does NOT bring it into a defined state. In hindsight, this should have been obvious from the '448 patent Fig. 2 --- in the figure, RESET only goes to two blocks, and the read circuitry has no RESET (ouch).

What is worse is that it does not use Q3 as a timing signal for the SLOW modes. Instead, it has an internal divider (most likely, a toggle FF) which divides FCLK by 2. The state of this internal FF is erratic and unknown and it is not affected by RESET. So some software tricks must deduce the state of this FF by observing the response of the IWM to the challenges. And then force synchronization of the "C" models to the presumed internal state of the IWM.

And this is greatly complicated by the fact that the update of the read data register from the internal shift register IS NOT happening as described by the '448 patent. The '448 patent claims that the update happens after each shift operation. This is not seen in the real IWM on the lab bench. It does indeed update the read data register after each "0" shifted in, but the update after a "1" shifted in happens with a delay. Which so far I did not find out with 100% certainty how that is controlled (but I have an idea). This behaviour for "1" shifted in however is more consistent with the behaviour of the DISK II controller, which takes 4 x Q3 cycles (2 CPU cycles) from detection of a negative transition on RDDATA to the "1" being shifted into the shift register. However, the delay observed in the IWM is much longer than that. (Note that the "rules" for the case when a disk byte is complete, MSB of SR set, are different --- the above discussion is for what happens before the MSB is set).

So the IWM Rev "A" does not do what is claimed in the '448 patent. This patent has quite an interesting story that was not told (AFAIK):

U.S.-Pat. 4,742,448

Inventors: Wendell B. Sander Robert Bailey

Assignee: Apple Computer, Inc.

Filed: Dec 18, 1986

Continuation of Ser.No. 573,067 Jan 24, 1984, abandoned.

The '448 patent was granted May 3, 1988, shortly before the Apple IIc was discontinued in August the same year. (ouch !)

I am going to look into the Application of Jan 24, 1984 to see if there are any differences in the description of the inner workings (it turned out the new search system of the USPTO does not find it, but there are other ways). This is the literature research work that was greatly hampered by the snow storm. But as long as internet service providers spy on their customers, collect all the webpages visited, and build a "personality profile" they sell to third parties, I refuse to have internet at home, linked to my person, and must use anonymous public wifi. Of course with a specially tailored notebook computer which has zero personal stuff on it. Not good enough to protect me against malicious state actors, but probably good enough that internet providers can't build a 'personality profile' for me which they then sell to marketing or plain criminal organizations. The internet is both a curse and a blessing ... I remember a past when patent research involved travel to the patent office.

- Uncle Bernie

March 19, 2024 - 12:46pm

#13

UncleBernie

Offline

Last seen: 3 days 14 hours ago

Joined: Apr 1 2020 - 16:46

Posts: 875

Some progress was made !

I've now a 'C' model which runs in lockstep with the original IWM in the test rig, when being fed with 'possible' GCR codes (including gross timing variations, the data seperators finally seem to be the same). But it still fails (different data seen on the outputs) when I feed it with 'impossible' GCR codes. It seems that the system is particularly sensitive to '1' bit cells being too close together. This can't happen on normal magnetic media based floppy disks due to the limitations of the read amplifier, but it might happen with certain copy protections based on 'flux holes', zones with no flux change or no magnetization. This is bad news because it means there still is a difference between the real IWM and my "C" model. The good news is that despite of the different outputs seen under these strange conditions, for most cases, the 'lockstep' will be re-established until the MSB is set and indicates a disk byte to be ready for the RTWS code. I'm now focusing on finding the reasons / exact mechanisms where this recovery does not happen. This should give enough insight into how the "C" model must be improved until 100% state machine equivalence is reached for the read mode.

For the write mode, I already have a "C" model that nicely runs in lockstep with the original IWM. No issues there seen yet.

Alas, the two weeks I've allocated for the reverse engineering of the IWM are over now, and I must spend more time on other things. And I even don't have RTL yet, it's still a "C" models. If I had a boss, he would now start yelling at me. If he was a psycho. (Most bosses are, only few are OK).

- Uncle Bernie

March 22, 2024 - 4:10pm

#14

UncleBernie

Offline

Last seen: 3 days 14 hours ago

Joined: Apr 1 2020 - 16:46

Posts: 875

Weird behavior of the IWM read channel for close RDDATA pulses

As mentioned above in the prior post, I do have a "C" model of the IWM read channel which matches the behaviour of the 'real' IWM in the test rig 100%, as long as I feed it with valid, honest, real-world Apple-ish GCR flux streams. I can even add some complications like longer streams of no flux, and the "C" model still stays in lockstep with the 'real' IWM.

Where it gets weird is when pathological flux patterns are being fed into the RDDATA input of the IWM. These violate Apple GCR rules by providing more flux changes (active transistions on RDDATA) as would be "normal". It is, however, possible to produce such flux change / RDDATA pulse patterns by using a MFM speed rated floppy disk drive. Or one which is fit for the 'FAST' mode used by Apple in the Mac (2 us bit cells).

'Normal', that is, 'SLOW' mode Apple GCR has 4us bit cells which correspond to 28 FCLK cycles of the IWM when it runs in 'SLOW' and '7M' mode. I call this the 'legacy' mode as it is supposed to behave the same as the DISK II controller (it doesn't, though). This is the configuration of the IWM in the Apple IIc.

It boils down the the question: "What happens if the IWM 'sees' faster flux changes (same as closer RDDATA pulses) than expected by the recording.

I found that under the above conditions (SLOW and 7M) everything behaves as expected as long as the distance between RDDATA pulses is >= 14 FCLK pulses, which corresponds to 14 cycles of 7.159 MHz, or 1.96us . . . you won't get two RDDATA pulses spaced close together like that out of a legacy / single density floppy disk drive. But a higher density floppy disk drive could produce these ~2us RDDATA pulse distances, and any 'gadget' which attaches to the external floppy disk drive connector could pump data at much faster rates, and much slower distances of RDDATA pulses. This is why I called these patterns 'pathological'. They should never occur with real floppy disk operations. But the IWM may be used with a 'gadget' not being a floppy disk drive.

So I deemed it necessary to explore what the IWM does when such shorter RDDATA distances occur, and I found something weird:

It seems that the 'bit cell timing window' generator has a memory for more than one flux transition. Which means it's likely not a simple counter or state machine. There is more going on within the IWM.

Here is an example:

If I put in RDDATA pulses at FCLK = 45 and 57, then the expected two "1" bits will be seen in the IWM data output byte (which may not be the state of the read shift register at that same instant of time) at FCLK = 67 and 79. The first "0" will appear at FCLK = 107. This is 28 FCLK after the last "1" was seen. So far this behaviour is not weird, it is expected.

But If I put in RDDATA pulses at FCLK = 45 and 55, then the expected two "1" bits will be seen in the IWM data output byte (which may not be the state of the read shift register) at FCLK = 67 and 77. The first "0" will appear at FCLK = 95 (67+28) and the next "0" will be seen at FCLK = 105 (77 + 28). This is 28 FCLK after the respective "1" were seen. This is weird because it would imply each "1" seen starts its own 28 FLCK wide window after it. If it was a simple counter (or state machine) to do the window, I would expect that the 2nd shift "1" at FCLK = 77 would kill / restart the window opened by the 1st shift "1" at FCLK = 67. But it doesn't do that. I have some tentative conjecture on how the circuit doing that may look, and it's either an earlier update of the shift register (less latency to the actual shift operation) combined with a further delayed update of the read data holding register from the shift register, or they did use shift registers to make the variable timing window - which in NMOS are much easier to do than programmable counters.

This weird behavior is valid for further reductions of the RDDATA pulse distance down to 6 FCLK cycles. After that the behaviour gets even weirder, at a distance of only 4 FCLK the two RDDATA pulses are still "seen" but it the readout from the IWM jumps from $0E to $3B. Without the intermediate state $1D. Since a normal deserializer shift register with one input can't jump by 2 positions within one shift register clock, this hints that whatever the IWM puts out for read is not updated after every shift, but only every so often, controlled by a (yet unknown) logic. The IWM patent describes the logic which blocks the update of the read data register while the MSB is set, but this is NOT the same thing as a delayed update while MSB is not yet set.

RDDATA pulses with a distance of 4 FLCK or less are treated as if only the first pulse is expected. This can be explained how the synchronizer / edge detector works.

I'm not sure if it is worth to pursue this weird behaviour for pathological RDDATA patterns any further. I think it may not matter at all for regular floppy disk operations (or floppy disk emulators). IMHO, they simply should not produce these pathological patterns to be used with SLOW/7M mode. But in the strict sense of 'state machine equivalence' this exact behaviour of the real IWM should also be present in any substitute claiming to be a faithful reproduction of the original IWM behavior.

Seen in a more abstract way, imagine the following:

You have a black box, and a panel with switches and lamps attached to the black box. You are not allowed to look into the black box. But you can put any number of switch combinations in and observe which lamps are lit.

You also know that there could be a specimen "A" or a specimen "B" of a digital device in the black box.

Can you find out which one is in the black box ? Well, put stimuli in (set switches and give a clock pulse) and observe the lights. If you can show a sequence of inputs producing a different output, you have proof that that specimen is different from the other.

The first question, of course, is whether you think it's worth your time to find such a sequence. And the next question, if you choose to spend the time to find such a sequence proving a difference, can you ignore that finding without endangering your whole mission ?

(These are trick questions, of course, please comment what you think)

CONCLUSION (for now)

One takeaway is for sure: the IWM is NOT state machine equivalent to the DISK II controller, even when the IWM is in SLOW/7M mode. What's worse is that the IWM does NOT have a 'dead time' after detection of a RDDATA pulse, like the DISK II has. All of the above was discovered when looking for that 'dead time' and instead of the expected behavior (dead time) I found something weird.

It will take me a while to ponder over these findings. Maybe somebody else who has already made a 'IWM substitute' can make a comment if this weird behaviour also was seen and whether it is deemed to be worth to replicate it. (Note that my IWM specimen is the "A" version, not the "B" version --- so far I did not find anyhthing on the internet explaining what the differences between "A" and "B" were).

Comments invited !

- Uncle Bernie

March 23, 2024 - 2:57pm

#15

UncleBernie

Offline

Last seen: 3 days 14 hours ago

Joined: Apr 1 2020 - 16:46

Posts: 875

Apple IWM_middle.jpg

Middle section of IWM die photo with shift register (?) structures

Not having the time nor the motivation to make wall sized printouts of the IWM die photo and annotate every transistor and every net, the die photo is of not much help. So the only thing I can do to proceed is to analyze the data I have obtained further. I did not write the filter yet for the "three close RPD" event cases. Once I have that, I can see what happens with three RPD events within one bit cell window. If only two are recognized then I know there is a simple circuit level implementation to handle all cases of two RPD events in 28 FCLKs. But if all three are recognized, then most likely it is a shift register which triggers the events down the time line. It could just be fed with the RPD events and have taps to cause shift actions. Since the 7/8 mode bit could be handled by selecting different sets of taps by pass transistors, this may allow a much simpler and flexible circuit for sequencing the events than a programmable counter with two sets of preload values and two sets of decoders for actions. Last but not least there is the possibility that they really did use a 6 bit counter as implied by the IWM documentation found on the web. But the real IWM does not wait for 48 counts to decide to shift in a '100' sequence. It shifts these 0's and 1's at certain times related to the 28 FLCK wide timing windows defined by the RPD events.

Is there anybody out there who also has seen this weird behaviour of the real IWM ? (just want to avoid hunting ghosts).

Comments invited !

- Uncle Bernie

March 26, 2024 - 2:43am

#17

retro_devices

Offline

Last seen: 1 day 2 hours ago

Joined: Jan 31 2024 - 06:40

Posts: 62

Could it be that Apple

Could it be that Apple implemented for some reason a linear feedback shift register (LFSR)? If so the stack memory you are looking for won't be in the IWM.

March 26, 2024 - 4:24pm

#18

UncleBernie

Offline

Last seen: 3 days 14 hours ago

Joined: Apr 1 2020 - 16:46

Posts: 875

Answer to the LFSR/Stack question

In post #17, retro_devices asks:

" Could it be that Apple implemented for some reason a linear feedback shift register (LFSR)?

If so the stack memory you are looking for won't be in the IWM. "

Uncle Bernie answers:

A LFSR of this length could be used as a large counter with almost no logic, just shift register stages. The only candidate for such a contraption I can think of is the 1 second timer for the ENBL1 / ENBL2 change delays. It has to count 8 million FCLKs (assuming 8 Mhz clock). This could be done with 23 bits and only two feedbacks to the XOR. But most likely, they used some prescaler. I'm not so much interested in how that delay is being made, as in my IWM substitute I will use a RC time delay to do that.

No, I'm not looking for a "stack memory" in the IWM. I just mentioned how the peculiar behaviour for closely spaced RDDATA pulses could be implemented in a "C" model. The fact is that the IWM 'remembers' not only the position (in time) of the latest RDDATA pulse but also the position of the RDDATA pulses before it, unless the distance between the pulses is larger than the bit cell window. All these are "pathological cases" not found in real world floppy disk data streams (absent of configuration mistakes, such as using a double density drive / media with 2us bit cells in the SLOW mode of the IWM).

So far I have found sixteen different possible "C" models, all of which can run in lockstep with the real IWM for all non-pathological RDDATA streams. But only a few can do the same (running in lockstep) with pathological RDDATA streams having RDDATA pulses with lower than normal spacing. And none of these yield an elegant RTL implementation which is lean and mean. I am quite sure from the calibre of the designers involved in the IWM that their logic is elegant, lean and mean.

You might wonder why I'm so nit-picking about these "pathological" cases to be modelled correctly. This is because at some point down the road I want to invoke automatic reverse engineering tools which are not "AI" so they are dumb and they will, at times, inevitably produce RDDATA test streams which trigger these pathological cases. I wrote these tools more than 35 years ago and forgot too much about how they work that I can't add features which would avoid generation of certain 'prohibited' patterns. Back in the day there never has been a case where such 'prohibited' patterns had to be handled. A whole new formal description language would be required to implement that. Unless I would hard code the 'prohibited' cases right into the backtracking algorithms. Which is not a viable option. I'm running out of time.

More about the current state of work in the next post which I prepared offline.

- Uncle Bernie

March 26, 2024 - 4:44pm

#19

UncleBernie

Offline

Last seen: 3 days 14 hours ago

Joined: Apr 1 2020 - 16:46

Posts: 875

More about the IWM reverse engineering process

Some more insights into the IWM reverse engineering process and the progress with it (or the lack thereof):

Superficially, as seen in the IWM patent, the read channel in the IWM looks deceptively simple: like in the DISK II, it is a deserializer shift register controlled by a state machine.

Complications, complications.

One complication over the DISK II is that the IWM adds a data hold register which is clocked at certain times to capture the contents of the deserializer shift register. This hampers any attempt to observe the deserializer shift register contents in real time. There is always a delay between a shift and when the new contents can be "seen" from the outside. This delay can be anywhere between one and several clock cycles after a shift. Or, if the data hold register is a latch that can be transparent, zero clock cycles after a shift.

The main difference to the DISK II is the way the "freeze" of the data upon the MSB being set is implemented. This happens when a disk byte is complete. The data seen by the 6502 cannot change for the duration of the sampling software loop, otherwise data may get unnoticed / lost. In the DISK II, this is done by a side branch of the state machine which does not shift but still waits for a "10" or "11" bit cell sequence. This takes 7-8 CPU clock cycles (a bit cell is 4 us, 4 CPU cycles, but spindle speed variations can shorten the time available). Shortly after the "10" or "11" bit cell sequence was detected, the shift register gets cleared, and the "10" or "11" gets shifted in, ending the "freeze".

In the IWM, there is no such trickery. There is a counter which gets preset (or reset) for each incomung RDDATA pulse. It counts up (or down) every CLK cycle. At certain values of the counter, a shift is initiated. Which may be a "0" or "1" depending on whether a RDDATA pulse occured since the last shift ("1") or not ("0"). In the IWM, there never is a "freeze" of the deserializer shift register, but the same effect as in the DISK II is accomplished by just not updating the data hold register for a while after the MSB has been set. The update commences after a "1" has been shifted into bit #1 (the second bit, first one is #0) of the previously cleared deserializer shift register, plus a few CLKs of further delay. Looking from the outside, this has almost the same effect as the "freeze" of the DISK II, except that the first new value ($02 or $03) appears instantaneously, without the intermediate values $00 and $01, which cannot be observed on the IWM outputs, but can be seen on the DISK II (although software would struggle to capture all of those for the same MSB event ... a logic analyzer of course would be able to show exactly what happens).

To complete the IWM read channel, there is a RDDATA synchronizer and edge detector, which is another shift register clocked by CLK. The first few stages (actual number yet unknown) just synchronize the RDDATA to the CLK. The following stages have at least one gate which discerns a negative edge and provides a signal of one CLK width to a state machine controlling the deserializer shifting, and the update of the data hold register.

The above is what is known for certain, as it is described in the IWM patent, U.S.-Pat. 4,742,448, which was granted on May 3, 1988 - shortly before the Apple IIc was discontinued in August of the same year (oh the irony). But the IWM lived on in the first Macs. Alas, the description in the '448 patent is lacking a lot of the finer details of how the actual logic in the IWM was implemented.

Lack of observability of inner nodes

You see, that the central issue with the various function blocks mentioned above is that they are not directly observable from the outside (other than the data hold register) and they involve a multitude of clocked circuitry with a unknown delay in terms of "number of clocks" between a RDDATA event and some internal action happening - such as presetting / resetting the counter and shifting the deserializer shift register.

Proposed exercise to see that multiple solutions for logic implementation exist

As an exercise, you can draw the block diagram and make some assumptions about what happens when. Then you add a number of stages in the synchronizer and adjust counter preload values and counter states which trigger shifts accordingly, to get the same sequence of events as observed by reading out the data hold register, for the same input sequence at RDDATA.

You can then see that there are many possible solutions. Some of which conform better to border cases (or the "dark corners" of the design spec - what happens if Apple's GCR rules are violated). Alas, none of the documents from Apple which can found on the web specifies exactly what should happen in such cases.

Border cases

As an example for such a border case, I found out that the real IWM "stalls" once three "0" have been shifted into the deserializer. Apple GCR rules only allow for no more than two "0" in a row. All Apple firmware and formatting conforms to this rule even for the SYNC bytes. There are no exceptions. But still, the IWM accepts three "0" in a row. And then, 28 FCLKs later (in SLOW/7M mode), when the 4th "0" would be shifted into the deserializer shift register, it refuses to do that. It "stalls". No shift happens. But again 28 FCLKs later, it does accept the "0" and shifts it in.

The bit cell window counter

How come ? The simplest explanation is that the IWM has a bit cell window counter which is longer than a bit cell. This is implied by some IWM spec documents (Apple drawing 343-0041-B seems to be the last revision available on the web). This seems to describe the "B" revision of the IWM (I'm working with the "A" revision now). This revison business is to be dicussed later ... the important piece of information to explain the three "0" reacion is the table at page 6 of said document, "Read data bit cell window":

IWM_bitcell_Window_snip.JPG

For SLOW/7M mode, the counter intervals are spaced 14 CLKs (28 FCLKs) apart. The table shows only the intervals for valid Apple GCR sequences having no more than two "0" in a row. But by adding "14" to the last row shown (for "100") we can guess how the "1000" came about:

35-48 "100"

49-62 "1000" (conjecture, added)

63 no shift. Next "0" shift scheduled 14 CLKs = 28 FCLKs later, unless a "1" comes along)

It is seen that is the counter would count further than as shown in the spec, the additional 49-62 window could indeed allow for the creation of a "1000" sequence in the deserializer shift register. But what happens at 63 ? And to which value does the counter roll over ? To 0 or to 7 ? We know that a "0" bit cell after crossing 63 does not cause a shift. The next "0" is only shifted in 28 FCLKs later. which would imply the counter rolls over from 63 to 7 (1 CLK) and the shift would happen at the transition from count 20 to 21 (14 CLKs from 63). The number of FCLKs is twice that, 14 x 2 = 28 (same as the "28" above).

This would explain the observed behaviour with the "stall" after shifting in three "0" in a row. Note that the spec says nothing about the rollover behaviour of the counter, nor does it mention how a "1000" pattern could ever get into the deserializer shift register. At least I could not find that spelled out. These "holes" in a spec are the bane of every IC designer and every test engineer (and reverse engineer, too ;-)

About the "B" revision of the IWM:

Page 4 of said IWM spec spells out that the "B" revision adds a "window" to the edge detector, which I initially suspected to be there in any case, being familiar with various FDC designs. Even the DISK II controller has "dead times" where the state machine ("Woz Machine") does not react to transitions on RDDATA. Here is a snip from the spec:

IWM_B_Revision.JPG

Dead zone window added to the IWM in Revision B

For me, this is good and bad news. The bad news first: this certainly affects some of the "bizarre" behaviour patterns observed with "pathological" GCR sequences (mentioned in post #13 of this thread). The good news is that this difference between the "A" and "B" revisions is only small, and an easy add-on after I have a faithful substitute for the "A" revision I have on the lab bench right now. Still, I would need a working specimen of a B" revision for this step (any donations ? --- Note that even a "defective" IWM might still work for checking the read channel --- I found that all "blown up" IWMs I desoldered from broken Apple IIc had damaged stepper motor control outputs, and the rest worked fine).

IWM bug not fixed by Apple.

What I find somewhat disturbing is that in the "B" revision Apple did not fix the bug (or "feature") of the IWM which makes it useless for NMOS 6502 based systems. They just put a warning in the spec, explaining what happens. It's a shame. As I see it, it's most likely a side effect of a "PORT MODE" feature which allows them to use the MSB as a clock for a downstream data path, a feature which is only useful for "PORT MODE".

"PORT MODE" is defined as ASYNC mode bit ON and LATCH mode bit OFF. So it would have been easy to disable this feature for the "legacy" (DISK II compatible) mode where the ASYNC mode bit always is OFF.

Too many possible solutions (or non-solutions) how to build an IWM substitute.

So far I have found 16 (sixteen !) different ways to implement the IWM read channel as a "C" model ("C" means the programming language, not yet another IWM revision). All of which pass a "lockstep" test with the real IWM as long as there are no two RDDATA pulses too close together (the "pathological" cases which should not happen with a real floppy disk drive, but who knows which weird things the primitive floppy disk drives Apple has stripped down to the bare bones could do under certain borderline conditions, such as in some copy protection schemes).

This is a real issue. "Full feature" floppy disk drives typically have elaborate digital signal condition logic after the read amplifier which guarantee a certain, constant pulse width of the RDDATA pulses, and the absence of erratic pulses coming too soon after a valid one. Apple floppy disk drives (from the DISK II system) don't do that, they route the output signal from the MC3470 read amplifier directly to RDDATA without any such conditioning (the 74LS125 tristate driver does not "condition" anything). But don't get me wrong: this criticism is not meant to slam Apple's floppy disk system design as being bad. Stripping things down to the bare bones makes great sense to get the lowest cost. Which then makes the product more competitive.

Some comment on the merits of the DISK II approach (paragraph may be skipped for those in the know)

The DISK II system, when it came out, was truly revolutionary because it was the cheapest 5.25" floppy disk system on the marketplace, and not by a small margin. This was the first time in known human history that a floppy disk system was affordable for typical microcomputer owners who were private persons or small businesses. Every other computer manufacturer had much more expensive and elaborate solutions. The low cost of the DISK II combined with the Visicalc software was pivotal for the success of the Apple II in the small business world. Without these two winning factors Apple probably would have gone out of business like so many of the other microcomputer companies of the time period did. Viable small business solutions was where the real money could be made. Those microcomputer companies who only catered to hobbyists and gamers were losers, not enough money to be made from that clientele. This is still true today for the hobbyist market segment, but gamers of course turned into a huge market worldwide. Somehow the unemployed masses of adults living in the basement of their parents have to be entertained other than with mind altering drugs of all sorts (legal and illegal). Aldous Huxley predicted this ("Soma" in "Brave New World"). But enough of that. Just wanted to put this in a context of current society, or, to be more precise, the decline of society. Readers of this post in 200 years (if mankind still exists) then can understand under which conditions my work was done. Back to the IWM reverse engineering.

How the verification runs are done.

Here is a screen shot of the result of such a verification run:

PDRM4598_Score.JPG

End of a IWM model verification run (0 errors !)

The "scoreboard" at the bottom shows that after 1 Billion (1e9) FCLK cycles, using random (but non-pathological) RDDATA streams, almost all possible bit combinations have been read from that RDDATA stream, some of which are not even valid Apple GCR. But all of them of course must have MSB set. This is enforced by both the hardware and the firmware in any Apple II system, and the reason why the table starts with $80. You can also see that certain values never get hit. These typically contain more than four "0" in a row - definitely not valid Apple GCR.

Limitations of the approach / theoretical background

With the current test rig I can't do much more clock cycles because of runtime. To reach 1 Trillion (1e12) clock cycles, a run would take several years. With a rig running at full clock speed of 8 Mhz this could be reduced to a few months, running 24/7. I once developed a methematical theory which would estimate the number of clock cycles needed to exercise every state and every transition of a given STG with a given residual uncertainty, when using quality pseudorandom input stimuli, assuming there are no "lockup" states reachable (which would be "booby traps" planted by the designers either intentionally to thwart reverse engineering attempts or unintentionally out of sheer incompetence).

This theory was verified with a number of example state machines, and published in an electronics magazine, and I think it's mathematically sound (there was never a rebuttal from academia), but back in the 1980s computing power was too low to tackle complexities like the IWM. I mention this to give some reason why I chose to use 1 Billion clock cycles for that run, and no less. It's based on my theory, and not on a "gut" feeling. Alas, I had to use some estimates for key parameters of the formula, because the actual implementation of the various state machines and their interaction in the IWM is yet unknown. A "black box". The fact that this state machine essentially has only one input (RDDATA) and that only two bits of the deserializer shift register influence the transitions (other then the trivial shifting itself) allows a reduction of state variables entering the equation. Otherwise the approach using random number based stimuli would be hopeless - for more complex state machines, this brute force approach would have runtimes in the order of the age of the known Universe, billions of years, or even more. Much more. Thus is the devastating power of the exponential function which only few people understand (this is why so many people get enslaved by compound interest, just saying "MAFF IS HARD", and lack of brains to tackle "EVIL MAFF" can always be compensated by doing more slave labor --- not my words: "The borrower is the slave of the lender." citation from: Proverbs 22:7-9 English Standard Version 2016 (ESV) --- so the scam / topic is 1000's of years old, and people have been warned by scripture).

In conclusion, it is worth mentioning that any such pseudorandom number based approach to exercising unknown state machines is futile once the state machine gets too complex. The power of the exponential function makes the search space grow beyond the capabilities of our computers.

Possible improvements of the method in case of small PLDs

The algorithm can be greatly improved when the state variables can be readily observed after each clock. Such as on small early PLDs (16R4, 16R6, 16R8, ..., 22V10). This allows the deployment of self-learning algorithms which apply prior learned knowledge about the STG to speed up the search for yet undiscovered transitions and states. The algorithm can also use sophisticated logic reduction algorithms (like ESPRESSO) from time to time to "clean up" the transition functions it has found. Then, based on knowledge about the limitations of the given PLD in terms of product term count per macrocell / output, it can further reduce the search space.

Proprietary software tools for reverse engineering

Back in the 1980s / early 1990s I had written some proprietary CAD tools using these techniques, and some of them were sold commercially with great success. My automatic "black box" reverse engineering tools however never reached the maturity to be a product. And the rise of the complex PLDs put an end to that automatic reverse engineering approach anyways. But it worked fine for simpler PLDs.

Apple TMG HAL automatically reverse engineered

I used these tools to (almost) automatically reverse engineer Apple's TMG HAL, which is a 16R8, and published the results on Applefritter. For more complex designs like the IWM it's hopeless to use the self-learning algorithms which work so fine for small PLDs.

Approach for more complex ICs (like the IWM)

Once I have a match with the brute force approach, and a netlist for that, my tools can automatically produce test vector sets which exercise every aspect of the state machines in the netlist. These test vector sets then can be fed into the real IWM on the test rig to discover differences, if any. Alas, not seeing differences does not prove that the state machine(s) within the IWM are the same as in the netlist (or the RTL which was synthesized into the netlist). All it proves is that the state machine in the netlist/RTL is at least a subset of the STG of the state machine in the real IWM. But if there is a difference found, it's not even a valid subset, but has a flaw. Back to the "drawing board" ... modifying the RTL.

Reason for also implementing "pathological" border cases

Hope this explains why I'm so nit-picking about the "pathological" cases of RDDATA sequences. Without having a match including those, I can't unleash these tools on the subject at hand. First I need a "C" model which gives me a 100% match with the IWM in the test rig, then I can hand code synthesizable RTL following the "C" model, then I can synthesize the logic into a netlist, unleash the tools on that netlist, and feed the result (test vector sets) back into the real IWM sitting in the test rig, to see if there is a difference (= flaw).

This is a lot of work. But I'm getting closer every day.

Comments invited !

(Especially comments from those 2-3 people in the world who alrady have reverse engineered IWMs - but please don't send me your RTL. Don't be a spoil sport. For me it's a welcome mental exercise for my skills and a good pastime. But if I made a gross mistake you can see from the above descriptions, any comment along the line - "It's not like that" would be welcome. But don't spill the beans !)

- Uncle Bernie

March 26, 2024 - 4:46pm

#20

retro_devices

Offline

Last seen: 1 day 2 hours ago

Joined: Jan 31 2024 - 06:40

Posts: 62

I am inclined to think the

I am inclined to think the IWM emualtors some people are claiming to have done (so far) are only partial, they cover only some aspects of the IWM, for example sufficient functionality to be used in //c's. What is the NMOS 6502 problem? In Liron Smartport controllers the IWM works properly in the NMOS 6502 equipped computers.

March 26, 2024 - 5:43pm

#21

UncleBernie

Offline

Last seen: 3 days 14 hours ago

Joined: Apr 1 2020 - 16:46

Posts: 875

The alleged problem of the IWM with NMOS 6502

In post #20, retro_devices asked:

" What is the NMOS 6502 problem ? "

Uncle Bernie answers:

It appears to be a data setup (or hold) time issue involving the MSB when the RX register is read, looking for a valid data byte. The Apple spec for the "B" revision (and some other documents) mention that the 65C02 regenerates insufficient logic levels on the (internal) data bus, but the NMOS 6502 does not (which I can confirm from its transistor level schematic).

These documents claim that this timing bug make the operation of the IWM with a NMOS 6502 "unreliable".

I don't have the Liron controller card schematic but I think they could have added some ICs to dodge the issue. Look for some register between the IWM and the data bus. They also might have chosen to just fix the MSB (Data bus bit DB7).

- Uncle Bernie

March 27, 2024 - 1:17am

#22

retro_devices

Offline

Last seen: 1 day 2 hours ago

Joined: Jan 31 2024 - 06:40

Posts: 62

The IWM data bus is in

The IWM data bus is in parallel with the card's ROM data bus and passes thru LS245 to the slot's data bus. The LS245 delay could be sufficient to act as register but since the IWM issue is still not entirely clear (to me?) this cannot be concluded.

March 27, 2024 - 1:40am

#23

BusError

Online

Last seen: 1 hour 22 min ago

Joined: Jul 9 2023 - 06:39

Posts: 58

Just pinging in to say that I

Just pinging in to say that I really enjoy your brain dumps UncleBernie! I've only come close the the disk controller when writting my emulator lately, so my knowledge is JUST enough to make sense of a little of it, but, I enjoy reading it anyway, thanks for typing it all ;-)

March 27, 2024 - 4:12pm

#24

UncleBernie

Offline

Last seen: 3 days 14 hours ago

Joined: Apr 1 2020 - 16:46

Posts: 875

On the difference between IWM revision "A" and revision "B"

This is quite interesting. According to Apple's IWM spec for the "B" revision, the "B" revision just adds a "read window" after each RDDATA pulse, and this window, which also could be called a "dead zone" which prohibits any further RDDATA pulse to enter the internal IWM machinery. In other words, it ignores any 2nd RDDATA pulse coming too soon after an earlier pulse. (see the above posts on the topic for more details).

This morning I added the "dead zone" to my existing "C" model and did a few experiments with its effects.

The interesting find is that having such a "dead zone" will remove most of the weird behavior I see with closely spaced RDDATA pulses on the real IWM in the test rig (an "A" revision not having that dead zone / windowing function yet).

It is very difficult to find some digital logic which would make sense (from an IC designers point of view) and produce the exactly same behaviour as seen in the "A" revision of the IWM. The best solution for that riddle which I found was a long shift register into which RDDATA is being fed, clocked with the same CLK = FCLK/2 as in usual slow mode, and have several taps on that shift register which trigger certain actions, such as "shift in a 0" or "shift in a "1". As I mentioned in previous posts, there is a long shift register seen in the IWM die photo. Without spending way too much time to analyze the die photo any further, I can't say if that is the shift register I need in my "C" model to produce the same behaviour. It also could be just a long LFSR counter, i.e. to make the 1 second delay. In MOS technology, making shift registers costs much less transistors and die area than making counters, so this could be a reason for using shift registers. "State machines" in the more general sense were universally hated by IC designers back in the 1970s and 1980s because there were no automatic software tools to synthesize them and to do the layout. Everything had to be done by hand and the best way to make "state machines" was to use a PLA (a programmable logic array) which comprises a regular matrix of transistors that can form a sum-of-products array. By putting in a MOS transistor at a certain crosspoint of the matrix, the "product term" or "sum term" (corresponding to AND and OR) would get another input. The act of putting transistors in at specific places is the "programming" of the logic array. Any other attempt to make a "state machine" using random logic would get you into a quagmire of irregular combinatorial logic sprinkled with flipflops (or dynamic storage nodes). The design and layout of this style of implementation is very error prone, and consumes more time (= money) for the layout. But depending on the logic equations, it may be smaller and faster than a PLA. A lot of the early microprocessors did use this more costly approach for the control part outside the data path. But if you look closely on the die photo of the 6502, you can see a very regular array of transistors which looks like a PLA. This is the "instruction decoder" of the 6502. But it's not a true PLA. It's only product terms. There is no regular "sum" array. Instead, there is a very irregular mess of random logic between the "instruction decoder" and the data path (which comprises eight regular bit slices). This, of course, was a design decision to make the 6502 small and fast. But it certainly did cost them a lot of money to do that design and the layout.

Based on this historical perspective, I could see a good reason for the designers of the IWM to use a shift register based design style to implement the timing sequences in the IWM. Which then leads to the weird behaviour observed, where it seems to "remember" earlier RDDATA pulse positions despite one or two further RDDATA pulses have entered the digital machinery. This cannot be explained easily without invoking shift registers in lieu of "normal" state machines. (Of course, from a mathematical standpoint, even a shift register can be treated as a state machine, but believe me, you will not be able to draw the STG :-) . . . unless it's a very short / trivial shift register. For two bits it can be done, for three it's messy, and for more bits it gets intractable.

However, with the "dead zone" or "read window" seen in the "B" revision of the IWM, most (if not all) of the bizarre behavior for closer-than-normal spaced RDDATA pulses goes away. Which leads to a very clean, lean, mean "C" model. Which can be implemented with a trivial state machine having only four state bits / 16 states, and which would be mostly compatible with the state machine seen in the DISK II.

So this is my preferred solution. No need to waste time on implementing bizarre behaviour patterns which most likely are just a side effect of the logic level implementation method used in the original IWM, revision "A". The 4 bit state machine based implementation possible for an implementation of the 'cleaned up' revision "B" is also much better suited to be implemented in a PLD or CPLD. (Imagine wasting 16 macrocells on a stupid shift register just for the timing sequences).

So you can see where this is heading: I will only re-implement the revision "B" of the IWM.

But to do that I first need to find a "B" revision. I only have two "A" revisions. Does anyone want to swap one of his "B" revisions for an "A" revision ?

- Uncle Bernie

March 27, 2024 - 4:26pm

#25

softwarejanitor

Offline

Last seen: 1 day 7 hours ago

Joined: Jul 5 2018 - 09:44

Posts: 2587

The only IWM that I have is

The only IWM that I have is on a LiRON card (card for connecting a UniDisk 3.5 drive or a SmartPort device to an Apple II). Those are fairly rare and valuable (though not as much as they were before the Yellowstone card came out) and I only have one but I might be willing to loan it for non-destructive testing. I'm pretty sure it is one of the later revisions but I'd have to dig it out and check the chip to be sure it is a -B.

March 27, 2024 - 5:37pm

#26

retro_devices

Offline

Last seen: 1 day 2 hours ago

Joined: Jan 31 2024 - 06:40

Posts: 62

I can help building the same

I can help building the same LPT tester and send you the logs from an IWM-B version with your DOS software. The LIRON card's IWM is not socketed.

March 27, 2024 - 9:22pm

#27

baldrick

Offline

Last seen: 2 days 14 hours ago

Joined: Apr 26 2016 - 08:36

Posts: 681

I seem to recall that there

I seem to recall that there is a DIP-28 IWM chip aboard early all-in-one Macs, too.

Mac 128, 512, Plus, SE, SE/30...

There's a far greater liklihood to find a dead Mac for harvsting than there is to pull one from a dead IIc or Liron card.

March 28, 2024 - 3:50pm

#28

UncleBernie

Offline

Last seen: 3 days 14 hours ago

Joined: Apr 1 2020 - 16:46

Posts: 875

Building a second test rig is not the best way ...

In post #26, 'retro_devices' wrote:

" I can help building the same LPT tester and send you the logs from an IWM-B version with your DOS software.

The LIRON card's IWM is not socketed. "

Uncle Bernie answers:

Thanks for the offer, but I don't want to waste your time with that. Building such a LPT tester takes a while and unless you later turn it into something more useful (as I intend to do, it will go into the Apple-1 as a memory expansion/graphics card/floppy disk card) it's a waste of time.

The difference between "A" and "B" is very small. I can turn the "A" in my test rig into the "B" situation by just changing one constant which defines the minimum RDDATA pulse distance. So it's a very trivial thing not worth to invest a lot of time or money for.

There is only one catch / possible pitfall, which is known in the field as the "fencepost bug": if you have to build a fence over a distance of N units of length, how many fence posts do you need if their distance is the length unit ? The obvious answer is: N + 1 fenceposts are needed.

The problem applied to the IWM is the lack of precision in the language they (Apple) use in their documentation about the 'read windowing': if they say it's 6 CLKs wide, what does that really mean ? Six clocks beginning with the clock which detected the RDDATA edge or six clocks after that clock or six clocks after the RDDATA pulse goes away ?

This can only be examined using a real IWM, rev. "B".

If I had one specimen, it would take me 10 minutes to find out. But as you mentioned, the "Liron" card (and the Macs) always has the IWM soldered in, and the same is true for all the Apple IIc I ever repaired due to defective DRAMs or IWMs (there is a link, when the DRAM fails, but the machine still runs, as the bootstrap loader is in ROM, then it may say "Disk Error" and then users try to mess around with the floppy disk drive, blowing up the IWM in the process).

I'm looking for a IWM which does not come out of a very rare and valuable card.

Could buy one from UTSOURCE but as these are Chinese sellers, it may be fake (more likely fake than not).

I really appreciate your kind offer to build such a LPT card, but I don't even have a complete schematic for it: I build these simple little gadgets from 'schematics' in my head, as I have done it often enough over the past 40 years to remember all the pin numbers on the LPT connector and the pin numbers on these 74xxx ICs (it's always the same types I use). But in case of the IWM part, I drew a schematic just to make 100% I don't make a wiring mistake which would blow up a precious IWM which are so notoriously hard to find. I could send you a photocopy of that. Send me a PM if you are interested.

- Uncle Bernie

March 28, 2024 - 4:24pm

#29

retro_devices

Offline

Last seen: 1 day 2 hours ago

Joined: Jan 31 2024 - 06:40

Posts: 62

@Uncle Bernie, you can send

@Uncle Bernie, you can send me whatever handwritten schematic you have. Two TTL ICs and one 28 DIP socket connected to the LPT is rather simple to me. Did that kind of programmers, readers, dongles back in the day a lot.

On the other hand if just one pulse width must be measured maybe this can be done with a scope and a LIRON card without desoldering anything from it? Just by controlling the IWM from the Apple2 itself?

March 31, 2024 - 4:51pm

#30

UncleBernie

Offline

Last seen: 3 days 14 hours ago

Joined: Apr 1 2020 - 16:46

Posts: 875

On interfacing a LIRON card to exercise the IWM in it

In post #29, 'retro_devices' wrote:

" On the other hand if just one pulse width must be measured maybe this can be done with a scope and a LIRON card without desoldering anything from it? Just by controlling the IWM from the Apple2 itself ? "

Uncle Bernie answers:

It's not just 'pulse width'. I'm not sure if 'pulse width' matters in the IWM. My software allows to set the RDDATA pulse width, and as I found out, there is no different behaviour of the IWM if the RDATA pulse width is varied. The minimum pulse width is one FLCK. This means they run the synchronizer at FCLK (~7 Mhz or 8 Mhz in the Mac). But all the other state machines run at FCLK/2 in case of SLOW (legacy) mode I'm most interested in.

Except for some downstream details, like the MSB being delayed (this is always a FLCK event).

It is true that a LIRON card could be exercised in an Apple II without desoldering the IWM, but a specific hardware would need to be built to generate the RDDATA pattern in real time, dictated by the clocks made by the Apple II itself.

This could be done but IMHO is a waste of time. It is quicker to add a 50-pin "slot socket" to such a test rig to be able to exercise the IWM in a LIRON card without desoldering the IWM.

BTW, if you use truly professional desoldering equipment (a soldering iron with a hollow tip with suction provided by an electrical vacuum pump) the risk to destroy the IWM in the process is low, but alas, it is not zero. These a 40 year old ICs now, and formation of intermetallic compounds at the point where the gold bond wire meets the aluminum bond pad has progressed for 40 years. This is a solution process which cannot be stopped, and the thin layer of intermetallic compound is very brittle. So a little bit of mechanical / thermal stress can cause a crack to develop, and then this pin as an intermittent contact (or no contact) anymore.

For rare "unobtainum" ICs like the IWM it's better to leave them where they are. Note that even extraction from an IC socket causes enough mechanical stress to crack these nasty intermetallics.

Desoldering of a defective IC of course is a different story. It will go into the trash can anyways.

So at the moment I'm exploring ways how to a complete Liron card could be plugged into my test rig. I can test the hardware and the software out with a DISK II controller card. Then I need a Liron card with a Rev. "B" IWM as a loaner. I did some search for Liron card photos on the web and it seems that many have Rev "B" IWMs. Ironically, from the photos on the web, early 128 kByte Macs seem to have Rev "A" IWMs. Not sure if the photos on the web are a good representation of the statistical distribution of the IWM revisions on these cards, but at least we know that Liron cards with Rev "B" IWMs do exist and that a blindly bought 128k Mac may not contain the wanted Rev "B" IWM.

This Liron card interfacing work of course slows me down. The only schematics I found on the web for the Liron card are in Eagle PCB which first needs to be installed. They were put on github by Steve Chamberlin of BMOW fame, when he gave up on the 'Yellowstone' controller design for a while. This was in 2019. Now, five years later, he has completed the design and sells the 'Yellowstone' controller on his website. Which is good news because it substitutes the 'Liron' card, and this means that the 'value' of real Liron cards will fall. Maybe to the point where I can risk to desolder a Rev "B" IWM from one ?

- Uncle Bernie

P.S.: will send you the schematics for the test rig as a PM

April 1, 2024 - 4:17am

#31

retro_devices

Offline

Last seen: 1 day 2 hours ago

Joined: Jan 31 2024 - 06:40

Posts: 62

Hi. Unlike your prediction

Hi. Unlike your prediction the single burnt IWM I have does not work at all, not only its phase outputs are failing. It is generally high impedance on almost all of its pins. Luckily I have a -B variant that is in its PCB (device) but is removable and I am willing to try with it. I sent you a couple of questions directly to your email because I am worried about the way the data bus input is accomplished in your test circuit.

April 5, 2024 - 3:37pm

#32

UncleBernie

Offline

Last seen: 3 days 14 hours ago

Joined: Apr 1 2020 - 16:46

Posts: 875

Anyone having Eagle PCB software Version 6.6.0 and up ?

Hi fans -

unfortunately this project made no progress since I found out about the "dead zone" feature of the IWM Rev. B

Since I can't get my hands on a functional desoldered IWM Rev. B, but I might be able to do the experiment with a Liron card (without desoldering anything on it) my plan was to extend my test rig with a 50-pin slot socket and plug a Liron card in.

But for planning this I need a schematic of the original Liron card - just to inspect it and make sure how to drive the IWM on it correctly.

Now, Steve Chamberlin of BMOW fame has put his early (Y2019) work on reverse engineering the Liron card on github, see here:

https://github.com/steve-chamberlin/fpga-disk-controller

it contains his reverse engineered Liron schematic in the eagle/Liron - original subdirectory. So my grand plan was to

a) re-install Windows XP on an old machine

b) re-install Eagle

c) start Eagle, load the schematic, and print it.

Once I wasted several days with this effort, it turned out to be futile because the Eagle I have is a version 4 and not a version 6.6.0, so it would refuse to read the schematic. Eagle 4 does not even recognize the new file format they used later.

So all my precious RQLT was wasted on this futile effort and I still don't have the schematic.

So here is a humble request:

If anyone out there who follows this project, and has Eagle 6.6.0 and up, please download Steve's work from github and print the schematic of the original Liron card as a pdf, and send it to me.

I can't continue this project without that schematic, so any help would be much appreciated.

- Uncle Bernie

P.S.: There is an important lesson from this waste of time

If you decide to put something up for grabs ("open source"), either on github or anywhere else, always add a pdf of your schematics, Gerbers for the PCB, plain text files (editable with vi) for the source code, etc., so that interested people not having the license to proprietary CAD software still can read, print, and use it.

Anything using just proprietary file formats is useless (unless people would want to buy a license for said proprietary software), and when it's useless it's worthless and wastes everybody's time. Imagine what happens when somebody wants to read or use your work in 10, 25, 50 years and the proprietary software is unobtainium.

(as far as Eagle is concerned, no, I'm not going to buy yet another license, and worse, Eagle was sold to some larger outfit and these greedy bastards only offer time limited software licenses. I predict that this will be the death of Eagle. Which was a fine PCB layout software before it was sold. I bought a DIPTRACE license instead, because it is not time limited. I would never, ever buy any software which has a time limit or recurring license fees. Not even Cadence (it's too expensive for hobbyists or small businesses, but that doesn't matter because typical hobbyists can't do full custom IC design and small businesses could not pay the salary of an experienced full custom IC designer anyways).

April 5, 2024 - 3:35pm

#33

justinmc

Offline

Last seen: 3 weeks 14 hours ago

Joined: Dec 13 2021 - 12:20

Posts: 36

Liron schematic

If anyone out there who follows this project, and has Eagle 6.6.0 and up, please download Steve's work from github and print the schematic of the original Liron card as a pdf, and send it to me.

I can't continue this project without that schematic, so any help would be much appreciated.

Is this what you're looking for?

https://www.applefritter.com/files/2024/04/05/liron-schematic.pdf

Justin

April 5, 2024 - 3:48pm

#34

UncleBernie

Offline

Last seen: 3 days 14 hours ago

Joined: Apr 1 2020 - 16:46

Posts: 875

Internet search blindness strikes again !

In post #33, justinmc wrote:

" is this what you're looking for ?

https://www.applefritter.com/files/2024/04/05/liron-schematic.pdf "

Uncle Bernie thanks you !

And, to be honest, I did not look for it on Applefritter. What I did is a Google search which lead me to Steve Chamberlin's github site.

It's the same issue all the time:

- almost everything you want to look at or need is on the internet, somewhere.

- with the right search engine and the right keywords you can find it.

- use the "wrong" search engine or the "wrong" keywords and it won't be found, or buried after 100's of pages of bogus hits.

I had that happen to me more than once. Happens all the time !

Now I'm glad to have the schematic and from a quick inspection I see no reason why interfacing the Liron card to my test rig should be difficult.

- Uncle Bernie

April 5, 2024 - 3:55pm

#35

frozen signal

Offline

Last seen: 22 hours 9 min ago

Joined: Mar 10 2023 - 21:36

Posts: 56

Autodesk Eagle Viewer

Also, you can use their online viewer here: https://viewer.autodesk.com/

You need to "Sign up for free" though. But I was able to use the .sch files from the github repo you mentionned and generate the schematics.

April 5, 2024 - 3:55pm

#36

justinmc

Offline

Last seen: 3 weeks 14 hours ago

Joined: Dec 13 2021 - 12:20

Posts: 36

I uploaded it

You're welcome. I had eagle 9.6 already installed so I just printed the pdf and attached it to my reply to you. So it wasn't on here before 15 minutes ago.

Justin

April 5, 2024 - 4:50pm

#37

robespierre

Offline

Last seen: 4 hours 35 min ago

Joined: Feb 27 2021 - 18:59

Posts: 493

doco

In case they are not already known, there are several Apple documents about the IWM here:

http://mirrors.apple2.org.za/Apple%20II%20Documentation%20Project/Chips/IWM/Documentation/

In particular, it seems the notes by Bob Bailey (iwm_19831129.pdf) may pertain to questions about the read chain timing.

There is another doc by Bailey (apple2_IWM_INFO_19840228.pdf) that describes the one-shot multivibrator used to adjust to out-of-spec magnetic domain flips on the floppy disk.

April 5, 2024 - 5:11pm

#38

DosFox

Offline

Last seen: 1 week 2 days ago

Joined: Apr 5 2024 - 17:04

Posts: 1

Lisa 2/5 IO Floppy Controller

Not sure if this is helpful at all, but the Lisa 2/5 used a derivative of the Apple ii Floppy Controller design, that I feel is closer in design to the IWM.

https://github.com/alexthecat123/Lisa-PCBs/blob/main/2%3A5%20I%3AO%20Board%20Schematic.pdf

Page 4 shows the floppy controller interface.

April 6, 2024 - 5:05pm

#39

UncleBernie

Offline

Last seen: 3 days 14 hours ago

Joined: Apr 1 2020 - 16:46

Posts: 875

Work resumed ... and the motor timer is next.

.... and I'm currently trying to figure out how the motor turn off timer is implemented. I have some strong evidence that the shift register like structures seen in post #16 are indeed a LFSR for the motor timer (as mentioned in post #18) and not necessarily double used for the read bit cell timing, which still is a mystery (I already had the pdfs mentioned by 'robespierre' in post #37). That read one shot can be modelled as a simple 4 bit counter, which, with the appropriate slow or fast clocks, does the same thing as the real IWM (and runs in lockstep with it, proven by the long run in post #19), so I'm almost there. But this run had limited the minimum distance between RDDATA pulses to 'sane' values which occur in real floppy disk drives using 4us bit cells. The exact mechanism of what can be observed for shorter (pathological) RDDATA pulse distances is still unknown, and very weird, because so far I can explain it only with yet another shift register to do the timing ... think an extension of the RDDATA synchronizer shift register. These pathological cases may go away with the Rev. B's added 'dead zone' logic. At least I'm sure that there will be fewer 'pathological' cases to implement.

The motor timer is yet another mystery. I'm quite sure it's said LFSR. Here is a bit of theory and my line of reasoning:

* binary counters are very expensive to implement in that NMOS technology, in terms of area and transistor count. LFSR based counters are the most efficient. Just as a fun sidenote, those of you who had the LED version of the Texas Instruments TI-30 calculator of the 1970s and marveled at the nice 'spinning' digit while it was calculating transcendental functions, you now know why: the program counter of that very primitive CPU was based on a LFSR and did double duty to drive the segments of the display.

* The IWM has an internal CLK which is FCLK/2, so we can assume the designers of the IWM used that, to shorten the LFSR by one bit.

* I ran some experiments after IWM power up and found that the motor timer would produce inconsistent delays (in terms of FCLK count) unless the motor control bit (L4) was set for more than a certain number of clock cycles (did not explore the exact threshold yet). The most logical explanation for this is that a set L4 just shifts '1's into the LFSR, to initialize it. Once the L4 gets cleared, the LFSR goes through its pseudorandom sequence until it reaches a value where some logic detects an end value. This then would be interpreted as a motor timer who has run out, and the motor would be turned off (unless the IWM is configured with motor timer disabled).

* There is evidence for this logic on the shift register. If you look at the IWM die photo at a magnification you can see that the (alleged) shift register has two horizontal parallel metal lines carrying the internally generated non-overlapping clock phases (PHI1, PHI2, typical for dynamic NMOS logic of the time) but there also are two further horizontal parallel metal lines which form two NOR gates with many inputs, coming out of the shift register. Here is an annotated snip of the die photo: (please note that I did not look to long at it, so take this with a grain of salt)

IWM_shifter.JPG

Shift register like structure in the IWM

The "L" I drew in red is a poly line which forms a gate of a NMOS transistor, the source being grounded (the GND ? means "tentative") and the drain going to the summing metal line of one of the NORs. The usual depletion load transistor to pull up the line (and complete the NOR) is elsewhere.

This type of circuit can detect any state of the shift register. But I can't know if the shift register is negative logic or not, nor its length, nor its feedback terms (the polynomial). This would require a more thorough and time consuming analysis of the die photo. Which I don't want to do.

* Now, if we assume that the designers of the IWM wanted minimum transistor count, the most likely candidate would be a 22 bit LFSR with the polynomial P(x) = x^22 + x^21 + 1. This would require a two input XOR gate with inputs driven by the last and the previous last shift register stage, feeding back to the first shift register stage.

When initialized with all '1', this will reach all '1' after 2^22 - 1 = 4,194,303 shifts. This would be 8,388,606 FLCKs. But the motor timer in the IWM runs for 8,322,817 FLCKs. So they either use a different polynomial, or a different start value, or they detect the state of the LFSR at (or around) 4,161,408 shifts using said NORs. (There is an uncertainty of a few FCLKs give or take, as there may be additional clocked dynamic logic stages downstream, until the motor turns off).

* My take on this (unless I'm completely mistaken or in the woods, and I see ghosts, aka LFSRs which are not there) is that this complication over a plain vanilla LFSR wih P(x) = x^22 + x^21 + 1 is, most likely, intentional. You may ask why this makes sense (or not).

A LOGICAL REASON FOR THE MOTOR TURN OFF TIMER BEING COMPLICATED / OBFUSCATED

Well, we all know that Apple was plagued by copycats since the Apple II came out. The Apple II, in its original TTL-only based form, was ridiculously easy to copy. The Taiwanese knock-off artists just had to desolder all components from an original Apple motherboard (or slot card) and then digitize it to get the films to make clones of the PCB. The contents of the ROMs and PROMs was copied verbatim. The rest of the components were industry standard ICs. Bingo, after maybe two weeks of reverse engineering work invested, they had a Apple II clone fabrication line up and running, and exported the copycat product to all markets wordwide (Apple was really hurting from that). I remember that back in the 1970s and early 1980s, most "Apple II" I saw were Taiwanese clones, and not the real deal.

So it is quite obvious that Apple sought to put a stop to that, and the result was the Apple IIe and IIc with the MMU, IOU and IWM full custom ICs. And it was (and still is) quite common for full custom ICs to have a few little added complications in them which are not documented anywhere, down to outright "boobytraps" (like in the early Z80) which render any attempt to copy the mask layout futile. So why would the IWM designers not spend a few extra transistors to make the motor timer less easy to duplicate. Software made by Apple could check the number of CPU cycles needed until the timer expires, and detect any IWM copy where the timing is off by a few clocks.

I don't know if my somewhat paranoid conjectures about the motor timer's other potential purpose (detect knockoff IWMs) are justified, and if such Apple software (dealership / service center diagnostics ?) checking the motor timeout that stringently indeed exists. But it would be the only way to check the authenticity of an IWM. And in my career as a IC designer I have seen things in terms of IC "copycat poisoning" you woudn't believe. This may also explain why some Chinese knockoffs of Western ICs can't ever reach the specs of the originals. Even their voltage regulators suck. Their opamps are catastrophically bad, too. Which does not hinder Chinese IC counterfeiters to re-label this trash with the type number and manufacturer logo known for excellent opamps with stellar specs. These counterfeits are so good looking that they may slip through the incoming inspection of OEMs and end up on their circuit boards. Which then fail tests. And the "bad" ICs ended up at our company's QA department. Customer claimed it failed our datasheet spec. Sure enough, QA found a 741 type die in it, and a poorly performing 741 at that. No wonder the fake could not meet our specs (for our super high performance opamp, selling at boutique prices).

As we have seen with the Bulgarian clones of the Apple IIe using knockoff MMU and IOU ICs, use of full custom ICs for protection against copycats does not protect against nation state actors, especially Communist ones, where the work of their slaves (everybody other than inner party members who live in luxury are slaves) is essentially free for the parasitic government who enslaves them (leading to the slogan: "they pretend to pay us, we pretend to work", which, alas, is also coming to "Western" countries in which wages can't keep up with inflation. Take this just as a warning for things to come.).

THE MYSTERIOUS IWM TEST MODE

Part of that complexion is the software programmable TEST mode of the IWM which shortens the motor timeout to ~65,570 FCLK cycles. It also may do funny things with reconfiguring some other logic in it. I don't think they have overdone that (in terms of added transistors), because all this costs money, but it's another thing to deal with.

CONCLUSION

Faithful reproduction of fully custom ICs is not trival, even for simple ones like the IWM. But there are reasons to believe that a faithful reproduction that is clock cycle exact in all functions may be justified to avoid nasty surprises. I've naively thought I could use a RC time constant to do the motor off timing delay, but now I have second thoughts. And a 22 bit LFSR made with PLDs is costly. FPGAs fare better but once a real binary counter is invoked by the HDL, it gets costly, too, at least for small FPGAs. Maybe it should be an optional feature, just in case: add the exact delay IC as an option, and only if needed. But I also think that no third party software (other than Apple's own software) would try to authenticate that the IWM is the original. Because third party software manufacturers could care less if their software runs on authentic hardware or on a knockoff.

So far for today.

Comments invited !

- Uncle Bernie

April 17, 2024 - 9:22pm

#40

UncleBernie

Offline

Last seen: 3 days 14 hours ago

Joined: Apr 1 2020 - 16:46

Posts: 875

Mystery solved: why the IWM works with both NMOS and CMOS 6502

Hi fans -

this work has been stalled (and still is) because I waiting for getting a Rev B IWM in my hands. Being bored I decided to write up my findings on the inner workings of the IWM which explain why the IWM works with both NMOS and CMOS 6502. There has been a confusion / controversy over this for a long time, as evidenced by many posts on the topic. I'm now able to solve the mystery (for those who are interested).

First, read mode never has been questioned. On the software side, it's a just a loop which checks for MSB of the disk data deserializer shift register getting set. If it is set, the shift register contains a complete disk byte (or disk "nibble", confusing Apple terminology which I want to avoid - it came from the very first DISK II system where any "disk byte" only contained 4 bits = a "nibble". Alas, they kept this terminology when the improved encoding could do 6 bits per "disk byte").

THE READ MODE BUG

Not questioning read mode however was a fallacy, as it turned out. The IWM has a bug which renders its synchronous read mode (the default after power up) too unreliable for the NMOS 6502. Apple document 343-0041-B (same number as the IWM Rev.B) also known as the Rev 19 IWM spec) has a warning about this bug on page 2. It boils down to the MSB being handled differently from the other 7 bits, probably a leftover from the "port mode", and so the read timing requirements of the NMOS 6502 get violated occasionally. The document also claims that the 65C02 has regenerative feedback on the internal data latch and hence, does not suffer from the bug. I think this is NOT the proper way to swipe such a problem under the rug. But all Apple IIc in the field prove that their "solution", or better, "non-solution", works. All Apple IIc use a 65C02.

THE WRITE MODE PROBLEM - and how they solved it

It was believed by many Apple II aficionados that the IWM would not work in write mode if the 65C02 was replaced with a NMOS 6502. As it turned out, this was a fallacy. Because the IWM designers found a subtle trick to make it work with both the NMOS 6502 a n d the CMOS 65C02.

This is what was suspected: the DISK II floppy disk controller critically depends on the "phantom read" cycle of the STA abs,x instruction to make the write mode work. In the "phantom read cycle", which is CPU cycle #4 if the instruction's opcode fetch is CPU cycle #1, the 6502 produces a read to the effective address EA without accounting for a potential page crossing. This gives the 6502 enough time to bump up the EA to the next page, if the index X caused a page crossing. In the next cycle, CPU cycle #5, the write occurs and the 6502 drives the data on the data bus. Where the state machine in the DISK II immediately grabs it from and loads it into the shift register, without even looking if it was a write cycle at all. In fact, in that CPU cycle #5 the DISK II does not even look if it was addressed. Because it "saw" the L6 control bit to be set in CPU cycle #4 and consequently assumes that the very next CPU cycle is the one where the data byte to be written occurs on the data bus. Based in this assumption, it fetches it blindly. The very next instruction in the RWTS usually turns this "load" mode off again by resetting L6. The state machine ("Woz machine") now is in shift mode and will shift out a bit each 4 CPU cycles, for a total of 4 x 8 = 32 CPU cycles. At exactly that point, in cycle #33, the next STA abs,x must have provided the next "disk byte" to be loaded into the shift register. For which L6 must have been set to get into LOAD mode (as described before). If L6 was not set, the state machine will write two more bits to the floppy disk, both zero. This makes a total of 40 CPU cycles and produces the "SYNC" bytes on the floppy disk.

WHY THE PHANTOM READ CYCLE IS NEEDED IN THE DISK II

The problem is this: the state machine is clocked with a gated Q3, and the gate is a NAND. So for each CPU cycle, the state machine gets two clocks, but, alas, the active positive clock edge from the NAND occurs when Q3 falls, close to the end of the PHI1 and PHI2 phases of the CPU clock (actually, three 14M cycles before the end). For a timing diagram, refer to the "SYSTEM TIMING" section of the "Apple II Reference Manual" of 1979, which came with every Apple II.

In the state machine clock of PHI1, the state machine can't "see" the upcoming change of L6 because the 74LS259 which contains L6 is gated by PHI2 (via the device select line, pin #41 of the slots). So it can "see" the change of L6 no sooner than the PHI2 phase, and it can change state when Q3 fallsnear the end of this cycle. But this is too late to tell the shift register to load a data byte from the bus ! The control signals to the shift register can be changed to LOAD on that state machine clock, but the LOAD can only happen on the next clock of the state machine (and shift register) and this happens towards the end of the following PHI1 cycle, when the data on the data bus is long gone (or only is 'bit vapors' which disappear in the mist ...)

The "Phantom read" cycle, CPU cycle #4, comes to the rescue. It will set the L6 in its PHI2 phase, the state machine will advance to the load state and set the shift register up for the actual load. The following CPU cycle #5 is the write cycle and the shift register will be loaded exactly at the time when the wanted data is present on the data bus.

This is the most simplified description I could cook up. The actual process is a little but more complicated, because once in write mode (L7 set) there is no provision to synchronize the state machine to the 32/40 CPU cycle sequence. Instead, the write mode is always entered from the "write protect sense" mode and this, with proper CPU cycle counting, will synchronize the state machine to the write loop. It is of utmost importance that from that point in time on where the write mode is entered, a l l the write cycles from the STA abs,x must be e x a c t l y 32 CPU cycles or 40 CPU cycles apart. Otherwise the software side will lose synchronization with the state machine happily huffing and puffing along and picking up whatever random data is found at the data bus when the state amchine reaches its LOAD state. Needless to say, this is catastrophic because once synchonization is lost, only trash will be written to the floppy disk. But as long as no interrupt intervenes, the system works robustly.

NO PHANTOM READ IN THE CMOS 6502

Alas, the CMOS 6502 fixes several "bugs" in the NMOS 6502, and one group of these "bugs" is related to the handling of page crossings. I think it's not really "bugs" but a "side effect" which is known and documented and rooted in the limits of what the NMOS 6502 designers at MOS Technology could do at the time being (Y1975). But in the NMOS 6502, any adressing mode with page crossing causes access to a bogus address, and the designers of the CMOS 6502 deemed this to be a threat. This has to do with flags in peripheral ICs which may be reset by access to certain addresses, and so some users may have fallen into that trap and their system didn't work as expected. The fix is easy - avoid page crossings for such operations - but programmers must read the manual and if they don't, they are not aware of the possible pitfall. I know the 6502 well, since it came out, and despite of this, I fell into such a trap myself: when designing the improved ACI for the Apple-1, it happened that the RTS at the end of my added code page would do the prefetch of a bogus opcode (it does not execute) in the next page, which due to the minimalistic address decoding would toggle TAPE OUT, ruining the recording. This is a "side effect" or "bug" that is so deeply rooted in the whole 6502 system architecture that even the CMOS versions did not fix it. Oh, these were the times where CPUs were simple ... today such behaviour would cause segmentation faults galore.

So to defuse the perceived threat, the designers of the CMOS 6502 eliminated the "phantom read" by forcing cycle #4 of a STA abs,x to never access a bogus address. Instead, they made the 65C02 to output a valid, guaranteed harmless address instead. Such as the address of the opcode of the STA. Or the address of the opcode of the following instruction. Different flavors of 65C02 may be different here, I didn't look all of them up, there were too many manufacturers and not all datasheets are available or show the cycle diagrams. But whatever solution they adopted, the "phantom read" is gone and the DISK II state machine will not enter the LOAD mode at the correct time. Unless the RWTS software is changed. But this code then would not work with a NMOS 6502. Note that I don't say it cannot be done. Because a 6502 program can detect if it runs on a NMOS or CMOS 6502 and then use the approporiate write routines with different timing to make the DISK II state machine do the right thing. It's a very slight adjustment, just at the entry of the write mode.

HOW THE PROBLEM WITH NO PHANTOM READ WAS FIXED IN THE IWM

Well, first, the IWM runs its inner state machines with the 7M clock (7 Mhz) so it can change states more often per CPU cycle phase. This is true even in the synchronous write mode which is the power up reset default configuration of the IWM, despite this configuration has the added twist that the disk write bit cell timing is controlled by the Q3 clock input. So you can't use this configuration to write 2us bit cells, only 4us bit cells are possible. But here is the trick they used to make the "phantom read" optional: the IWM never loads its shift register directly from the data bus. Instead, it has a write data hold register. And whenever the "data register" address of the IWM is accessed with address line A0 = 1, which is a write command, the data byte from the data bus will be loaded into the data hold register first. While the shift register is still shifting out data ! According to the IWM spec, here also is a "window" in which the data hold register accepts data, and this is two CPU clock cycles wide. The IWMs inner state machine, when reaching the LOAD state, after 32 or 40 CPU cycles, will load the shift register from that data hold register, and not from the data bus. The contents of the data hold register is whatever data byte the CPU wrote last during the two cycle long write window. This relaxes the critical timing over the DISK II and allows the IWM to work with both the NMOS 6502 and the CMOS 65C02, regardless if the "phantom read" cycle is there or not.

CONLUSION

The mystery about the IWM working or not working with the NMOS 6502 vs. the CMOS 65C02 has been solved. It turned out that it's the opposite of what some people (including me) had suspected:

It turned out that the IWM synchronous write mode (the power on reset default) works well with both the NMOS and CMOS 6502, because its designers have accounted for the "phantom read" cycle being preset or not. The IWM works in both cases.

But it also turned out that the IWM synchronous read mode (the power on reset default) does not work reliably with the NMOS 6502, due to a bug in the IWM which Apple did not even correct in the IWM revison B.

So the perception of the problem shifted to losing the synchronous read mode (vs. the suspected / debunked loss of the synchronous write mode) when the IWM is used with a NMOS 6502.

Note that Apple only tells us that the synchronous read mode of the IWM just gets "unreliable" when a NMOS 6502 is used, so it may work under certain circumstances and may not work under other circumstances. I intend to do some experiments on this (plugging various NMOS 6502 in an Apple IIc) to find out more. Alas, the firmware ROM of the IIc also must be changed to a version which does not use the extra instructions of the CMOS 65C02..

Also note that all the asychronous read and write modes of the IWM work with any flavor of the 6502. Alas, despite of the holding registers for both read and write mode, and a further relaxed timing over the sychronous modes, the FAST configuration with 2 us wide bit cells (double the storage capacity, equivalent to the step from FM to MFM in industry standard floppy disk systems) has only 16 us per disk byte and a 6502 running at 1 Mhz can't handle that unless coding tricks are used which increase the RWTS code space to a point where it becomes nonsensical to do that.

So far my intermediate report on the IWM reverse engineering findings. I will add more once I have a Rev B. IWM at hand. I'm not motivated to faithfully reproduce the weird effects seen in the IWM Rev A read mode when the distance of the RDDATA pulses gets far too low. This is a useless dirt effect which - hopefully - goes away with the Rev. B having the added read pulse windowing function.

- Uncle Bernie

Applefritter Talk

Bridged chat on:

Learn more

Please support the defense of Ukraine.
Direct or via Unclutter App

Active forum topics

Replace HAL chip -Apple IIe Plat? kent5
No way to boot an Apple IIe PAL... zannorum
OS9 Editor and C-compiler jbforrer
IIc RAM Troubleshooting DigitalNZ
A2 Term power mod Wayne

No Ads.
No Trackers.
No Social Media.

All Content Locally Hosted.

Built on Free Software.

We have complied with zero government requests for information.

IWM reverse engineering

PDRM4588_mId.jpg

PDRM4586_mid.jpg

PDRM4589.JPG

PoP IWM comment.png

Apple IWM_middle.jpg

IWM_bitcell_Window_snip.JPG

IWM_B_Revision.JPG

PDRM4598_Score.JPG

IWM_shifter.JPG

Applefritter Talk

Anonymous

Active forum topics

Recent content

Navigation

Search form

IWM reverse engineering

Applefritter Talk

Anonymous

User login

Active forum topics

Recent content

Navigation