UNCLE BERNIE'S IOU (and MMU) SUBSTITUTES IN PLDs

27 posts / 0 new
Last post
Offline
Last seen: 3 days 13 hours ago
Joined: Apr 1 2020 - 16:46
Posts: 875
UNCLE BERNIE'S IOU (and MMU) SUBSTITUTES IN PLDs

Hi fans -

may may have followed the work done by two people on substituting the MMU, see here:

 

https://www.applefritter.com/content/question-about-weird-thing-mmu-schematics

 

For those who don't know yet, the MMU and the IOU are full custom ICs in DIL-40 packages, whose design and manufacture Apple had commissioned to SYNERTEK. The initial versions were ready around end of 1981 / early 1982. They first appeared in the Apple IIe, and variants were used in the Apple IIc. From the documentation I could find it appears that they were made in an already obsolete NMOS process technology, much like the 6502 itself. So even if Apple had preserved the lithographic masks (or the GDS II data file thereof), nobody could produce them today anymore. NMOS long has gone the way of the dinosaur.

This is called "technical progress", folks. We can't make that old stuff anymore. And we can't fly men to the moon (and back) anymore, too.

 

THE MISSION STATEMENT AND THE OPTIONS

 

We need to make substitutes of the MMU and IOU (and the other custom IC seen in the Apple II family, like the IWM, and the SWIM).

 

There are basically two platform technologies which can be used to do that:

 

1) use a FPGA (Field Programmable Gate Array) and write the source code in VHDL (or Verilog).

 

2) use mid 1980s to early 1990s PLDs (Programmable Logic Devices) and write the source code in ABEL or any of the other proprietary PLD design languages, such as PALASM, CUPL, LOGIC, etc.

 

The originator ('frozen signal') of the above thread on the MMU chose to use 1), FPGA, while I chose 2), PLDs.

 

ADVANTAGES / DISADVANTAGES  FPGA vs. PLD

 

FPGAs are much more powerful than PLDs and into the larger ones a whole Apple II including the 6502 CPU could be packed. Small FPGAs with fewer pins and in a small package can fit onto an adapter PCB the size of a DIL-40 IC and that contraption plugs right into the original MMU and IOU sockets on the original Apple IIe and Apple IIc motherboards. Great to repair Apple IIe and IIc with defective MMU or IOU.

 

PLDs are not that powerful, so more IC packages may be needed to implement the MMU or IOU functions. Larger CPLD ("Complex PLDs") of the early 1990s do have enough flipflops and 'product terms' inside to implement both the MMU and the IOU, the IWM and all the other TTLs seen in the Apple IIc into one package, but those IC packages from back in the day were physically much larger than the fine pitch (or the BGA) packages of modern FPGAs. So even with such a large CPLD which could hold all the functions, it would not fit on a PCB the size of a DIL-40.

 

So why use a PLD approach at all ?

 

The answer is this:

 

- if you want a solution which fits on a DIL-40 size PCB, use the modern FPGA and pay for the automated assembly by robots which is inevitable for such fine pitch or BGA packages. I don't say that it's impossible to solder these by hand, but the yield typically is low, and the ICs are ruined more often than not, if hand soldering is tried. The next drawback of these modern FPGAs run at low core voltages (extra regulator needed) and typically can't tolerate logic levels produced by TTL or CMOS running from a 5V supply, which is the case in the Apple II family. So additional voltage clamping and level translation circuits are likely needed around the FPGA, and these clutter up the PCB, and can't be hand soldered neither. Any bidirectional signals cause additional complications, because you need to add control signals to tell the level translators in which direction the level translated signal(s) flow at any given time. All this is not easy to do, and 'frozen_signal' can be applauded to have gone this much harder path.

 

- if you want a solution which can be put together by the average hobbyist using hand soldering, for instance, a replica of the Apple IIe or IIc, using a larger motherboard specifically designed for the MMU and IOU substitution PLDs, go for that ! There are no complications, all components are user friendly (they can be "handled" by human fingers) and everything still runs from a 5V supply, so no level translators are needed. This is what I want to do. I don't want to make drop-in MMU or IOU substitutes in a DIL-40 form factor. I want to design a motherboard for Apple IIe / IIc replicas, which allow builders not only to have the Apple IIe and IIc experience, but also allows them to make modifications, by reprogramming individual PLDs. Which, of course, should be in sockets. The in-circuit programmable ones a a bit tricky to use, especially when coming from different manufacturers. It could be done, but IMHO is not worth the effort, at least not for hobby projects. Back in the 1990s, I have witnessed too much debugging trouble with PCBs full of heterogenous in-circuit programmable PLDs, and don't want to go there.

 

So far for today. Running out of time. In my next post, I will explain the method I use to reverse-engineer, implement and verify such PLD based substitutes. This thread is meant to preserve these methods for posterity (and for those who want to step into my footsteps).

 

- Uncle Bernie

Offline
Last seen: 3 days 13 hours ago
Joined: Apr 1 2020 - 16:46
Posts: 875
IC reverse engineering: Revelation Of The Method.

I wrote this up because no university or college teaches "IC reverse engineering".

 

What they teach is "IC design", which uses the same theoretical background (semiconductor device physics, network / circuit design theory, logic synthesis and reduction techniques, simulation) and use of industry standard CAD tools, but while being adequate for designing ICs, it's not enough to successfully reverse-engineer ICs.

 

My method, which I developed and practiced in the 1980s and early 1990s, works, at least for reverse engineering of digital ICs of modest complexity, which were already obsolete at that time, so many users having legacy systems still in production, sought to replace them. 

 

For reverse engineering of Apple's MMU, IOU, and the IWM this method is just right.

 

Don't try that with VLSI or ULSI ICs of the 21st Century. Reverse engineering of IC comprising billions of transistors is a completely different task.

 

THE "CLASSICAL METHOD" FOR IC REVERSE ENGINEERING AND WHO DOES IT

 

There are companies out there who specialize in reverse engineering of ICs, and they then sell the reverse engineered schematics to the semiconductor industry. I never worked for one of those reverse engineering outfits, but some of the semiconductor companies I worked for as an IC designer did buy these schematics (at a stunning price), but they would never admit that. The access to these schematics was strictly controlled and limited to upper management (who did not understand them) and higher ranked engineers such as "Senior Staff Engineers" or "Principal Engineers", and we were not allowed to tell any of our underlings that these things even existed. Officially, it never happened. But it gave some insight in what the competition was up to. Just to make things clear: doing this is absolutely legal. The cheaper way to just steal the design database of a competitor's product is not legal, it's industrial espionage. But reverse engineering is OK. It's done in every industry, since the beginning of the industrial age. Whether it's worth the money and time spent, is another question. I think that capable engineers should not waste too much time on reverse engineered stuff, but rather focus on designing new, innovative products. But sometimes,  knowing how the competition does it, is helpful to achieve that, to design a better product that beats theirs. But if a whole industry relies on reverse engineering, most of what they do based on duplicating designs of competitors, is the road to ruin. The Communist "Eastern Bloc" semiconductor industry (and their computer industry) was wrecked by doing that, never to recover. So far the benefits and drawbacks of reverse engineering, and who does it.

 

THEIR REVERSE ENGINEERING METHOD

 

The reverse engineering companies remove a silicon die from the package, take microphotographs, and then reconstruct the mask data for each layer, which involves etching away layer by layer, and then staining the features they want to discern. Once the mask data is reconstructed, a transistor level netlist can be extracted using industry standard CAD tools. The netlist then can be turned into schematic diagrams. Automatic tools for that do exist in standard EDA CAD suites (such like Cadence) but typically generate utterly unreadable schematics (read: they are not worth the money). As a remedy,  said "reverse engineering" companies have written proprietary tools which allow their specialists to interactively make readable scheamatics, which can be sold. This is lots of hours of human work of MSEEs who are not cheap, and this explains the stunning prices for these schematics. The equipment to make the microphotographs also is not cheap, and their sophistication increases down the shrink path. For IC s made in the leading edge technologies of the 1990s, an electron microscope is needed.

 

CAN HOBBYISTS DO THAT, TOO ?

 

With long obsolete process technologies, even hobbyists can use the same method. For the semiconductor processes of the 1970s and early 1980s, a good optical microscope is enough to take the microphotographs. Greg James and Barry and Brian Silverman did that for the 6502 (see: http://www.visual6502.org/) and the resulting netlist is accurate enough to use it for simulating the 6502 running software. I wrote my own Apple-1 emulator which is based on a switch level simulation based on their netlist, and used that to develop my Woz Machine based floppy disk controller for the Apple-1. The 6502 switch level simulator is here: https://github.com/mist64/perfect6502. I just put my own emulation of the Apple-1 with its 'cycle stealing' DRAM refresh around it.

 

But is this the most efficient way to develop substitutes of long obsolete / unobtainable ICs ?

 

I think, no, it isn't. The reason is that extracting the transistor level netlist  from microphotographs taken from the actual silicon die and then making readable schematics may take several man-years. And then you still don't have a substitute. Developing such a substitute from a transistor level schematic takes more man-years. This is why the Communist "Eastern Bloc" semiconductor industry failed so miserably. (And I know what I'm talking about, after the fall of the "Iron Curtain" I visited some of their semiconductor combinates, and once stood in a room whose whole walls had enlarged microphotographs of a DEC 16-bit processor as wallpaper - and each of the many thousands of transistors had hand written sizes and node numbers on them - an enormous waste of man-years, and after all these years, they only were able to make their copycat version of a long obsolete CPU).

 

A BETTER METHOD

 

This is how I did it back in the 1980s and early 1990s. I never used reverse engineered transistor level schematics, because my customers who wanted drop-in substitutes would never have paid my man-years. Instead, I followed much the same procedure / method as usual for regular IC design and development, but inserted the additional feature to run my 'paper model' aka 'high level simulation' in lockstep with the actual IC to be substituted.

This method would quickly reveal and uncover any difference between my RTL model and the actual IC.

And with the complexity of these 1970s and 1980s ICs still being  modest (compared to what we have now), it just took a few weeks to arrive at a 'paper' (actually, RTL / software) model of the IC which covered all functions of the original IC the customer needed and ran in lockstep with the real IC, being cycle-by-cycle accurate.

And then, the RTL was synthesized and partitioned / fitted into PLDs. A PLD implementation was built as a wire-wrapped adapter card which, by a flat band cable, plugged right into the same socket in the customer's target system (I'm still using the same wire wrap gun until this day).

Then, the system was booted up - and more often than not it ran. And it ran all the diagnostics.

In those few cases where it didn't, an ICE or a logic analyzer was invoked to find what was wrong. In almost all these cases, their software used "undocumented" features or "side effects" in the real ICs, and these then were added in the RTL. Rinse and repeat.

After everything ran fine, and the customer was satisfied after a thorough test and assessment process of the "new" solution, the PLDs were either put onto a small PCB which plugged into their socket (space permitting) or, if the added expense was justified, was put into a gate array or standard cell IC. For which the development risk, having proven RTL, was virtually nil. All of them worked perfectly in first silicon.

 

So far a brief oversight of my method. In the following posts, I will demonstrate how the method was applied to design and develop substitutes of the MMU and IOU custom ICs seen in the Apple IIe and Apple IIc.

 

Please be a bit patient and don't start commenting right now, to avoid cluttering up this thread. I first want to put the following posts with the various steps of the method up, before we can start to have comments / discussions.

 

- Uncle Bernie

Offline
Last seen: 3 days 13 hours ago
Joined: Apr 1 2020 - 16:46
Posts: 875
STEP 1: Collect every information the IC to be substituted

STEP 1: Collect every information available for the IC to be substituted / reverse engineered.

 

This "information"  includes:

 

- datasheets and application notes from the IC manufacturer,

 

- the schematic of the target system of your customer (so you can see how they use the IC, some of its pins / functions may not be used in their application, and this can greatly reduce the time and expense for the development of the substitute).

 

- if the IC interacts with software, the relevant parts of their source code, too.

 

- third-party examples where the same IC was used (preferably technical / programming manuals thereof).

 

The last point helps to determine if implementation of the full functionality (or only the part of the functionality the customer uses) is better / justified / worth it.

 

I had one case where the customer (who needed the substitute IC for their own legacy systems) sold several times more of these substitute IC than they consumed themselves to other "sufferers" worldwide and more than recovered the costs they paid me  for my work.

Offline
Last seen: 3 days 13 hours ago
Joined: Apr 1 2020 - 16:46
Posts: 875
STEP 2: Build a test rig for the original IC

STEP 2: Build a test rig for the original IC

 

This is a very powerful aid, because it allows you to run the original IC in lockstep with your simulations of the "paper model". If you would work with simulations alone, you can write as many test cases and assertions as you please, but what these actually do, is comparing your - possibly incomplete or wrong - phantasy of the original IC to your "paper model" (aka RTL, Verilog, VHDL ...). So by writing assertions, you are only chasing your own (possible) delusions about how the original IC works. Having the test rig is a very powerful means to correct these delusions as early in the process as possible. This is where phantasy meets reality. Proverbial, here "the rubber meets the road". A few hours of design and construction of the test rig can save you weeks or months of chasing bugs in a botched substitute with a logic analyzer hooked to the substitute in the host system.

 

The is one exception where such a test rig may be avoided. And this is when you already have a very solid understanding of the original IC, such as already having an implementation of an earlier version of the original IC which was built in hardware and is known to work.

In the case of the IOU, I already had designed a PLD implementation of the Apple II screen memory scanner driving a video data conversion EPROM in multiple PLDs, which was used to build my color graphics card for the Apple-1, see here:

 

https://www.applefritter.com/content/glimpse-uncle-bernies-apple-1-color-graphics-card

 

The differences of this PLD solution of the Apple II graphics system  to the original IOU custom IC appeared to be trivial, in the end, just a few soft switches and their readbacks more, and slight differences in the signals driving the video ROM (because my Apple-1 video ROM has a different internal organization than Apple's video ROMs). These fine details of the IOU have been reverse engineered and explained in the book "Understanding the Apple IIe" by Jim Sather, published in 1985 by Brady Comminications Co, a Prentice-Hall Publishing Company, ISBN 0-8359-8019-7. So I thought I know all about the IOU and skipped the test rig, which in hindsight, which is 20:20, was a minor mistake (more on this later, see "Snake Eyes").

 

This test rig should have an interface to a PC so you can exercise the IC using stimuli from a model written in a high level language.

 

I use "C" because it is as fast as it gets. Direct access to the parallel printer port can easily be done both on DOS and LINUX, but not on Windows, where you would need to write a DLL / device driver for each of those projects. I once tried a generic printer port driver for Windows to run such a test rig, but it was far too slow ... I think there are layers upon layers of subroutine and system level calls involved for every harmless looking access to the port. And if you can't reach the minimum specified clock speed of that IC with your software generated clock cycles, then all bets are off - dynamic logic (which was ubiquitous back in the day for any MOS IC) has 'forgetful' logic gates and flipflops, and unless the clocks run fast enough to refresh the charges on the dynamic nodes before they leak away, it won't work. The NMOS 6502, for instance, is specified for a maximum clock period of 40 us (or a clock frequency of 25 kHz). On any parallel printer port directly driven by a "C" program without interference by the OS this is easy to reach even on slow PC or notebooks from the 1990s (which still have a parallel printer port) running DOS. And DOS reigns supreme here because it's so primitive you can even turn off all interrupts while you exercise the IC. Alas, old notebooks with parallel printer port now command higher and higher prices because there is a whole lot of industrial machinery which is driven or maintained via parallel printer ports. Before these companies throw out a $10000-$50000 CNC machine, for instance, they pay any price to get such a notebook. Which drives their prices up. The alternative, of course, would be use of USB, which is certainly fast enough for any such test rig, but the test rig won't be as simple as this example:

 

 

which I built for the exercising the original MMU from an Apple IIe. Being wire wrapped, it can be reconfigured in a few hours of work to any other IC to be exercised, i.e. for the IOU. But the configuration in the above photo is for the Apple IIe MMU, which is the DIL-40 IC seen in the photo.

Offline
Last seen: 3 days 13 hours ago
Joined: Apr 1 2020 - 16:46
Posts: 875
STEP 3: Build a '"paper model" of the original IC

STEP 3: Build a '"paper model" of the original IC

 

Now, start to write code (I use "C") which, via the parallel port, produces CPU cycles on the relevant pins of the IC, and reads back the response. Based on the observations and combined with the knowledge from the datasheet, you build a 'paper model' of the functions in the original IC step by step. This is a high level model, i.e. for an address decoder for a soft switch you could write:

 

 if((adr & 0xFFFE) == 0xC050) SS_TEXT = adr & 1;

 

to see if the TEXT mode soft switch of the Apple II needs to be flicked, and set it to address bit A0. (As weird as it looks, this is how Apple II soft switches work, all the info is coming from the address only, no data bus involved. Woz was thinking out-of-the-box. The advantage of this method is that the data bus is not involved, so two operations, reading and writing data from a  port via the dats bus, and  simultaneously  reconfiguring the peripheral / port via the address being used, can be done at the same time. This trick was later used in the Disk II system to great effect.)

 

State machines are written in "C", too, using 'switch()' and 'case' statements and a state variable of arbitrary encoding. I use predefined constants (i.e. #define HST_LIVE 0) for states, and the same names are then later used in the RTL code for the PLDs or gate arrays or whatever.

 

The end result of this step should be a 'paper model' which exercises all functions (as specified / chosen by the customer) of the original IC and compares the response of the original IC sitting in the test rig with its own internal state and it own response on its 'paper output pins'.

 

I use the term 'paper' because this is a term left over from how it was done in the 1950s and 1960s, when computer time was expensive and so most logic design and its simulation was done with pencil on paper. Great computer architects like the late Seymour Cray designed all the architecture, down to the smallest sequencing details, of their supercomputers using pencil and paper, and then handed this higher level description to their hardware design team who turned that into gates and flipflops. There are two books which are a great read for anyone interested in how that was done:

 

"The Supermen" by Charles J. Murray, published by John Wiley and Sons, 1997, ISBN 0-471-04885-2

 

"The Soul Of A New Machine" by Tracy Kidder, published by Atlantic/Little, 1981, ISBN 978-0-316-49170-9

 

The first book gives insight into Seymour Cray's design method, and the latter into the methods how the engineers at Data General designed the Eclipse MV/8000, codenamed "Eagle" during the design / development phase. And the book tells you how they first built a "Paper Eagle" to be able to verify the architectural design and debug the hardware while it was being built.

 

Until the 'paper model' flies, and runs 100% in lockstep with the real IC, don't worry about lower level hardware implementation details. Form follows function. So get the 100% match of the specified functionality first. This is a stark contrast to the usual: code - compile - program PLD/FPGA - power up - crash - head scratch - fix code - rinse and repeat cycle seen since software guys ("coders") invaded the digital design realm . . .  this rot and deviation from the "right path" is now everywhere and leads to unstructured, poorly thought out, inefficient designs. Avoid this trap. Always do a clean 'paper model' first.

Offline
Last seen: 3 days 13 hours ago
Joined: Apr 1 2020 - 16:46
Posts: 875
STEP 4: Build a hardware adapter for the host system

STEP 4: Build a hardware adapter for the host system

 

Purpose of this adapter is to hold the original IC(s) in sockets and provide additional space for sockets for the actual hardware implementation of the substitute(s). The adapter typically uses flat band cable and associated headers to plug into the host system. The flat band cables should be kept as short as possible, but not too short, such that the adapter still can be taken out of the host system to change ICs on it, without unplugging the DIL headers of the flat band cables. This works fine for 1970s and 1980s era ICs, which were slow (except for ECL logic families). So it is fine for our Apple II work. Flat band cables would not work with high clock speed ICs of the 21st Century. There are impedance controlled flat band cables and their connector systems, which could cope with high speed signals, but these are outside the reach and expertise of the typical hobbyist.

 

Here is how my adapter for the MMU and IOU looks:

 

 

The checkout of the hardware adapter is done by plugging in the original IC(s) - as seen in the above photo, these are the original MMU and IOU of the Apple IIe - and then plug the adapter into the host system.  The yellow box in the photo indicates where a flat band cable is plugged in, which goes to two DIL-40 headers that plug into the MMU and IOU sockets (it turned out that 40 wires to all pins of the IOU plus 10 wires to the "middle pin numbers" of the MMU were enough to access all of the required signals which are not also available from the Apple II system bus, on the edge connector which goes to the slots, from which the A[15:0] signals were taken - otherwise I could not have saved the 30 wires on the flat band cable to the MMU, and would have needed to buy connectors).

 

The host system should run as before, and if you have diagnostics, run all diagnostics, too.

Do not skip that checkout. It helps to make sure the hardware adapter works robustly.

Offline
Last seen: 3 days 13 hours ago
Joined: Apr 1 2020 - 16:46
Posts: 875
STEP 5: Design the RTL of the substitute(s)

STEP 5: Design the RTL of the substitute(s)

 

How this is done strongly depends on whether PLDs, CPLDs, or FPGAs are used to make the substitute. FPGAS would use VHDL or Verilog to write the RTL, based on the 'paper model'. PLDs or CPLDs would use one of the proprietary PLD design tools, such as ABEL, CUPL, PALASM, LOGIC, etc., and they involve the extra burden on the designer to partition the functions over several PLDs. Sometimes, intermediate functions must be implemented with pin limitations of the PLDs.

 

As an example, in case of the MMU substitute, I had to use an address predecoder in a 20L8 type PLD to reduce the 16 address lines to five plus four, to shoehorn most of the MMU functions into a 40 pin EP910. Five signals MX[4:0] come from the 20L8 predecoder and they convey memory ranges and soft switch selections. The four remaining address lines are A[15:12] and these convey which 4k block within the 64k address space is selected. This saves 7 pins and also avoids many product term overflows, thanks to the hand optimized encoding of the predecoder MX[4:0] outputs.

 

When using an FPGA, no such extra work is needed. The CAD software takes your VHDL source code, synthesizes the logic, and fits it into the FPGA, all automatically.

 

But what you typically don't get from the FPGA design flow is the raw sum-of-products logic equations, which always come out of any of the proprietary PLD design tools. These become useful in the next step.

Offline
Last seen: 3 days 13 hours ago
Joined: Apr 1 2020 - 16:46
Posts: 875
STEP 6: Verify the synthesized logic against the 'paper model'

STEP 6: Verify the synthesized logic against the 'paper model'

 

This assumes that PLDs were used. For FPGAs, you can get a netlist out of the tool, which describes the whole logic. So theoretically, you could do STEP 6, too, as described here. But being a netlist, you need to write a translator to "C" language equations. A non trivial task I never tried to tackle.

 

Instead, I use a simple "C" language tool I wrote many decades ago which reads the PLD equations from the PLD design tool and translates them into "C" source code. These equations then can be copy / pasted or #included  into the "paper model" written in "C". Registers and their clocks are implemented like this:

 

    name_nq = A & B & !C ....

 

    if(CLK1_rise) {

      NAME = name_nq;

    }

 

In the above example, all the upper case variables are global signals in the system, visible to all functions. Lower case variables ("ffq_nq") are local variables visible only in the "C" function which emulates a module of the whole logic. The "_nq" is a convention to indicate "next state of q after the active clock edge".  Like in the real PLD, the "name_nq" is a combinatorial function of global signals (aka "variables").

 

My proprietary tools read the output files of the PLD design tools and automatically translate them to "C" source code. All I need to do manually is to "#include" the automatically generated "C" cource code snippets into my "paper model" and then run "make", to arrive at an executable which exercises either the "paper model", or the original IC sitting in the test rig, or both at the same time, and automatically compares the responses.

 

The stimuli (aka "test cases") for that boil down to call to cpu_cycle() or hw_cycle() functions, the former running one CPU cycle of the 'paper model' and the latter running one cycle on the real IC in the rest rig. Timing constraints of dynamic logic permitting, this can be done quasi simultaneously (we have 2+ GHz clock speed CPUs for that, the bottleneck actually is the parallel printer port, which on typical notebooks can do about 1 million port changes per second "only" ;-)

 

The test cases as such have to be written by hand, of course. But "C" being a far more powerful and concise high level language than VHDL or Verilog, even the most complex test cases can be written in the shortest period of time expenditure. (I also have proprietary tools which translate my "C" test cases into VHDL or Verilog test suite source code, but I did not use these anymore since the day I retired from my job in the semiconductor industry - I used to hand them to the 'digital designers' in my design teams, to make sure that I got what I wanted).

 

Not having these proprietary tools which I wrote over more than 40 years to allow me to do "miracles" (actually, gaining an edge over the colleagues not having or knowing my tools and methods) you may need to code your test cases in either VHDL or Verilog directly. But believe me, even if you are very proficient and well versed in VHDL or Verilog, you will need 3 to 5 times longer to reach the same coverage of your design compared to my ways. The former language (VHDL) is too verbose and too cluttered to be efficient, and Verilog, IMHO, is a fundamentally flawed HDL language full of pitfalls as it gives you plenty opportunity to write ambigous RTL which may or may not yield functional implementations, when transported from one  CAD suite from one vendor to another CAD suite from another vendor. VHDL shields you from that, but is too verbose (needs too much code written) to specify the same functionality. If you don't believe me that Verilog is fundamentally flawed, ask any professional using it about the $$$$$ their company pays for all the "linters" sold by third parties which seek to identify dubious / ambigous constructs in the Verilog source code. IMHO, any computer language which needs a "linter" is fundamentally flawed. This also applied to "C" in its early stages. Modern, powerful, "C" compilers like GCC enforce stricter type checking and also check for common pitfalls with botched library function calls. The penalty paid for these powerful and welcome features is the obesity of these advanced "C" compilers after half a Century of development and refinements. But as a user, I don't have to write the 10's of millions of lines of code of such an advanced compiler. I just use it. So far the pros / cons of the various programming language choices.

 

Now, after this machinery is in place, we can run the PLD based logic equations against the "C" model of the original IC, through the whole set of stimuli and test conditions, and we compare the output signals to be equal. Once this works, a single compile switch will run the PLD based logic equations in lockstep with the original IC, to see of there is a difference in the response. Any differences mean there is a bug in the RTL, so go back to STEP 5.

 

Here is a screenshot of a typical compare run of the "paper model" against the real IC (in this case, the MMU):

 

 

This is the "tail" of a protocol spanning about 250 kBytes (~ 250 printed pages) of text. So don't think my test suite is inadequate ;-)

 

After everything matches, we know:

 

paper model = original IC = PLD equations (as far as logic functionality is concerned)

 

In theory, all this could be done by writing only pure VHDL or Verilog from the beginning, without the test rig, and using plenty of test cases and assertions. But lacking the capability to run that logic in lockstep with the original IC, you did not actually check that your later hardware implementation of the substitute down the road has the same functionality as the original IC. Instead, what you did is to prove that whatever illusion / delusion or misconception you had about the functionality of the original IC is also present in your RTL and, later, in the hardware. So all your assertions may run just fine, but what you may get out in the end, won't work as a valid substitute, and correcting such errors much further down the road can mean a major detour, cost overrun, or even end the project (the worst case is if your 1st silicon plugged into the customer's IC socket is deaf, dumb, blind and / or dead).

 

My proven method revealed here go a long way to avoid such disasters.

Offline
Last seen: 3 days 13 hours ago
Joined: Apr 1 2020 - 16:46
Posts: 875
STEP 7: Generate ATVG test vectors for the PLD implementation

STEP 7: Generate ATVG test vectors for the PLD implementation and run them on the original IC

 

(Sorry, you can't do that as you don't have the proprietary tools for that, which I wrote in the late 1980s and early 1990s. I just include this step for sake of completeness of the" revelation of the method". In lieu of ATVG, you can write test vectors by hand, if you want. But this is very tedious.)

 

My proprietary tools were commercialized in the end of the 1980s and early 199s and became the market leader (and are yet another cornerstone of my financial independence). These tools blew all the competition out of the water, but, alas, thanks to progress with the semiconductor shink path,  were relatively short lived, as they could not possibly cope with the larger CPLDs having hundreds of macrocells, which started to appear in the early 1990s. For the simpler PLDs available when we launched the product, these tools were excellent, while the ATVGs of the competition already struggled with humble 22V10 etc. But the exponential explosion of the state space with all too many flipflops could not be defeated. These more complex designs can only be tested with scan techniques, and this is how the industry tests large digital blocks in their ICs until today. But alas, the "scan mode" happens on a different circuit which appears only when the scan is enabled. It does not exercise the normal functionality of the IC from its input and output pins only. Scan test just looks for bad gates and stuck nodes. It does not know anything about the real world functionality of that IC.

 

In contrast, my ATVG algorithms performed a state machine analysis first, and then generated test vectors using no preload or scan function, but used only the primary inputs and outputs of the IC. Unlike what the so-called "published science" of Academia at the time being could do, my own ATVG algorithms also worked for asynchronous state machines, or combinations of synchronous and asynchronous state machines, as long as these were implemented in PLDs. My tools also understood races and hazards and avoided them in the ATVG test vectors, and my "ambigous delay timing simulator", also based on my own algorithms, would expose hazards hidden in botched designs. All the flow could run fully automatic, beginning with a bare bones JEDEC fuse map file. The users of these tools loved them. One large customer gave us the feedback that our tool suite (which cost them five figures) had saved them millions in rework costs as it enabled them to weed out all the wonky and poorly testable designs in their vast PLD design portfolio (thousands of different designs, large corporation).

 

Why use an ATVG ? Are hand written test vectors / test cases not enough ?

 

The important point here is that the ATVG will exercise all the real world functions the IC has in the application, plus, possibly, unexpected cases, and hereby may produce input signal combinations and sequences which were not exercised in the 'paper model'. One example: if an address decoder is implemented as a sum-of-products function (typical for PLDs), then the ATVG seeking to test "stuck-at-1" nodes in the product terms will also exercise addresses with a Hamming distance of 1 from the specified address. The same is true for sets of control signals. This tends to look into nooks and crannies of the design not anticipated by the human designer, and may uncover surprises. The only downside is that the ATVG, not being able to read and understand the original specification / datasheet of the IC, may somtimes make use of "illegal" combinations of signals, i.e. a "read" and a "write" signal for the same target asserted at the same time. If these "illegal" conditions have been used  in the logic reduction stage (ESPRESSO knows these cases as the "don't care set") then surprising, but bogus differences between the model and the real IC may arise. These cases need to be screened out by hand / human intervention. For for ICs of modest complexity, there may be only a few such cases (i.e. half a dozen). This can be managed.

 

Interpretation of the fault simulation score.

 

The fault simulation score for all the stimuli produced by the 'paper model' vs. the fault simulation score produced by the ATVG is a measurement for the coverage of the functionality of all the test cases in the 'paper model'. If the 'paper model' had a good score in coverage (typically 90%-ish), then the ATVG may only add a few more test cases which exercise nooks and crannies of the functionality which the designer did not see or anticipate.

 

Where it gets interesting is when these test vectors then are run in lockstep with the original IC, via the test rig. Are all the responses on the outputs still matching the vectors ? Any difference means that this case needs to be carefully assessed, whether they are a false alert (don't care / "illegal" cases, see above). What does it mean ? May it happen in the real world application ? If so, your substitute is still inadequate. Go back to Step 5.

Offline
Last seen: 3 days 13 hours ago
Joined: Apr 1 2020 - 16:46
Posts: 875
STEP 8: (optional) Build a compare unit

STEP 8: (optional) Build a compare unit

 

Purpose of such a compare unit is to compare the output signals of the substitute with the output signals of the original IC running in the host system. So, there should be enough space on the adapter card to put the extra circuitry in.

 

The compare unit may be a completely separate PLD (or group of PLDs), depending on complexity. One of the key elements is to get the timing right - some signals may be available only at certain times, and this is why just a trivial XOR gate based compare followed by a latch won't cut it (in most cases).

 

When I planned the adapter card for the MMU / IOU substitute project, I allocated one DIL-40 socket for an EP910 PLD (with 24 macrocells) as the compare unit, adjacent to the socket for the original IOU. The screen memory scanner needs only 18 macrocells, so 6 macrocells are left for the compare function. So I was able to pack both the scanner and the compare into only one IC package. Nice !

 

Here is the schematic how this compare unit is wired to the IOU:

 

 

(The yellow markers are from the wire wrap construction, each wire put in is marked to be done).

 

In the above schematic, you can see where the Apple II scan counter (H[6:0], VC, VB, VA, V[5:0]) is located in the middle EP910, which is the compare unit. The compare unit receives the multiplexed addresses RA[7:0] from the original IOU (left hand socket both on the schematic and the PCB) and compares them to the ones it would generate by itself, depending on the graphics mode (soft switches SS_PAGE, SS_HIRES, SS_STO80 it receives from the MMU substitute on the right hand side of the schematic). If a difference is found (ERR1 and ERR2 signals) then either the ROWF or the COLF outputs gets asserted and lights up the LEDs. "ROWF" means a fault in the row address, and "COLF" means a fault in the column address.

 

Here is a photo of the actual hardware configuration at this step:

 

 

You can see that the MMU is already the substitute (1 x EP910 labeled "MMU", two 74S257 TTLs, two smaller PLDs) while the original IOU is still present. The compare unit is the EP910 without a label in the middle.

 

This is how the adapter card looks when inserted in the host system (Apple IIe):

 

 

You can can see the flat band cable(s) connecting the adapter card to the IOU and MMU sockets on the motherboard, and the two fault indicator LEDs on top. The red wire wrap wires are the ones used to hook the middle DIL-40 socket up for the comapre unit. These will be removed later when this place receives the EP910 which contains part of the IOU substitute. (But we are not there yet).

 

Any fault detected lights a LED and then freezes the increment of the scan counter in the compare unit, and it resumes counting only after the whole row / column video address pair matches. So after a brief initial disturbance after a graphics mode switch, the scan counter in the original IOU and the scan counter in the compare unit should run in lockstep, and the LEDs should be extinguished (not lit). Of course, the human eye could not see very brief faults lighting up a LED for one cycle lasting 1 microsecond. For "seeing" these events, we engineers have a useful electronic eye, called "oscilloscope". But having the LEDs is fine to see gross errors instantly without hooking up the instrument.

 

I expected no errors at all, because I had copy / pasted the PLD equations for the scan counter from my proven implementation in my Apple-1 color graphics card, whose "paper model" ran with a 100% match  against a simulation of the TTL logic seen in the original Apple II Reference Manual (this part of the work, the simulation TTL model, I had already done many decades ago).

 

Alas, what I got when running the adapter card with the compare unit in my Apple IIe, I got "Snake Eyes", bad omen !

 

This faint glow of the LEDs (which would be invisible under daylight conditions and was hard to catch on a photo) indicates that something is wrong. There are brief periods of mismatch of the video memory addresses generated by the real IOU and the compare unit. Here are the photos:

 

 

A closeup:

 

 

So far I did not have the time to investigate this any further, it could be a copy / paste mistake I made, or a real difference  between the TTL sequence and the real IOU. I will find out in due time. At the moment, I must focus on writing these posts before my good internet access goes away. But I promise, I will keep you updated as soon as I had the opportunity to get to the bottom of this !

 

Offline
Last seen: 3 days 13 hours ago
Joined: Apr 1 2020 - 16:46
Posts: 875
STEP 9: Put the substitute on the adapter card and run it

STEP 9: Put the substitute on the adapter card and run it in the host system

 

The purpose of this step is to see if the substitute does the same thing, in the actual host system, as the original IC does there. If you did a good job with the previous steps 1-8, it should work. It is recommended to run a lot of applications to gain confidence. My industrial customers typically had written their own diagnostics software for their systems, which would exercise everything. And of course, they would run all their applications software before they would sign off the substitute design.

 

But if there was an issue, then the ICE / and or the logic analyzer had to be thrown in the battle. This was a rare event, thanks to the power of my method, but it did happen. In all cases, it was either some unexpected and undocumented quirk of the original IC their software had exploited, sometimes knowingly, sometimes by happenstance, or some minor interface issue with the hardware adapter, which could be corrected by adding drivers / buffers or by replacing wonky flat band cable connectors (actually, when testing my MMU substitute, I had the latter case happen to me, an intermittent contact in the flat band cable header did cause erratic behaviour of the Apple IIe, which made me question the PLD implementation first, before the real culprit was found).

 

The power of the method

 

As far as I remember, I never had an embarassing failure where my substitute logic design work was the culprit. As an example what could happen despite of using the method I revealed in this thread, I remember one typical case of a communications interface protocol IC substitute which failed at target system startup - the datasheet of the original IC made a difference between a "RESET" command token and a "CLEAR" command token, which I had implemented  faithfully as seen in the datasheet, but it turned out that the software at the customer used only "CLEAR" and never issued a "RESET" token as prescribed by the datasheet. The actual original IC did the same for both commands, resetting all internal registers to startup defaults in both cases, but my substitute insisted on the use of the "RESET" command token  to get to these defaults, and my "CLEAR" only cleared a buffer (or terminated a sequence of the protocol, as the datasheet called for). The software was fixed instantly, and then we had a full success. But I had to update my substitute due to the many legacy systems they had out in the field. Which meant another drive over several hundred miles to visit them a second time, to get the sign off.

 

Situation with the Apple II IOU/MMU replacement project

 

In case of this Apple II project, I don't have any diagnostics on floppy disk, and as another project of mine related to an advanced floppy disk drive controller of my own design (called the "Ratweasel")  got stuck due to crumbling old notebook computers I intended to use to drive the hardware), so currently I don't have any means to put Apple II disk images floating around in the web on real floppy disks. And this greatly hampers my ability to really test the IOU and MMU substitutes. (Any 'donation' of real floppy disks would be appreciated, just send me a PM via the "send PM" button below my name if you want to contribute example floppy disks to this project and if you live in the USA (the latter is due to postage reasons)).

 

For the MMU substitute, an initial test already did happen. I ran the game "Wings of Fury" on my Apple IIe using the MMU substitute without any fault. And a few other games. But I only have a few of those. The only one which did not run was "Centipede" on an original floppy disk by Atarisoft. it does not even load on this Apple IIe, but I think it loaded and ran fine on my Apple II clone. This needs to be investigated.

 

The IOU substitute design is completed, it comprises one EP910, one EP610, two smaller PLDs (P20L8 and P20R4) and a few TTLs (1x 74LS257, 1x 74LS00, 1x74HCT132). The latter is used for the key autorepeat functions and the cursor flash pulses. I chose to not implement these functions with digital counters (as is done in the original IOU). I prefer to be able to customize the timing of autorepeat and cursor blink rate by changing resistors.

I did not have the opportunity / time yet to wire wrap the IOU substitute onto the adapter card, as I first want to know where the glowing fault LEDs are coming from.

Offline
Last seen: 3 days 13 hours ago
Joined: Apr 1 2020 - 16:46
Posts: 875
STEP 10: Getting the substitute design into production

STEP 10: Getting the substitute design into production

 

After the PLD based substitute has been run through its paces by the customer, and is found satisfactory, they sign it off for production.

 

In some cases, this was just a PCB with the same PLD based solution (space permitting) which plugged into the socket of the IC it substituted, but in most cases it was turned into a gate array or standard cell IC that would drop directly into the socket, using the same type of IC package. The latter was only possible in the 1980s and early 1990s, when semiconductor companies offered gate arrays suitable for small production numbers. Japanese companies would make small production runs of their low end gate arrays based on just one photolithographic mask to be made (and paid for).

These runs typically would yield some 10000's of the IC per lot. Smaller numbers were not economically viable (due to the NRE costs) but one company, called "ES2", offered single wafer production runs based on a maskless ebeam lithography at very competitive prices (maybe $25k for 5000 packaged and tested ICs). This was a great opportunity (and the only chance for small players in the field to get their own standard cell ICs). Alas, ES2 closed their doors after a few short years. Maybe their prices were a bit too low to cover their operating costs ...

 

Transforming the PLD based design, for which plenty of test vectors were available due to the application of the method, typically took only two or three man-days, using a workstation at the semiconductor house. I did not drive the tools myself - this was done by an expert familiar with them, which makes sense. All I did is to sit nearby and watch the work flow, and answer any questions which did arise. All ICs made this way worked in 1st silicon to the full satisfaction of the customer.

 

We only had one case where we got a desperate phone call from a customer that the ICs didn't work. Well, I had run all the test vectors on a few specimen before we shipped the lot to them. It turned out that they had a bad, worn out IC socket in their target system. Which had been caused by our work with the PLD solution and its flat band cables and headers. (Be aware that this can happen, and why it is so important to use headers with the right mechanical pin sizes --- some press fit headers have pins that are slightly larger than the pins of the real ICs, and this may stress the contacts in the IC socket too much, so they become unreliable with real ICs having the slightly smaller pins found on typical leadframes).

 

So far the "Revelation Of The Method". Sorry for the length. But it had to be to preserve this battle proven method for posterity. Hope you can learn from these posts and, maybe, you will adopt this method for your own work. Good luck ! "May the force be with you all the time."

 

Now, everything having been said, this thread is open for comments !

 

Comments invited !

 

- Uncle Bernie

Offline
Last seen: 1 day 5 hours ago
Joined: Jul 5 2018 - 09:44
Posts: 2587
Thank you Uncle Bernie for

Thank you Uncle Bernie for all you've done and for sharing with the community.  I always find your posts to be a learning opportunity.  I haven't had time to digest all that, but it is great that it is there for posterity.

 

Offline
Last seen: 3 days 13 hours ago
Joined: Apr 1 2020 - 16:46
Posts: 875
Intermediate report on the progress with my IOU substitute.

Hi Fans -

 

here are the first screen shots taken with my PLD based IOU implementation substituting the real IOU:

 

 

Please excuse the poor quality. It's due to overexposure by the old Y1998 digital camera I use.

Never found it necessary to buy another one, though. For normal snapshots in daylight it's quite good.

 

 

Wings of Fury. Actually, you see two frames of the game superimposed, as there is no way to sync the shutter in the camera to the video signal.

Yellow oval shows 'trash' bytes appearing prior to the 'official' start of the live video (meant is: prior to left hand edge of the 'live' screen area).

This is a bug that needs to be hunted down and killed dead.

 

 

In the last photo(above)  you can see that I repurposed the LED from the compare unit (STEP 8 of the method, post #10 above in this thread) to display the screen mode. One LED is for the HIRES soft switch, and one is for the 80COL soft switch.

Surprise, surprise, it turned out that "WINGS OF FURY" turns on 80COL during the static title screen, so it may be a DBLHIRES screen. Alas, I can't confirm that yet with 100% confidence because with the PAL Apple IIe I use as a development platform, all the colors are wrong when seen on a NTSC TV. Actually, the mismatched 'PAL' video signal  on a 'NTSC' TV causes a funny effect, as if the TV is on a LSD trip, the colors very gradually all over the spectrum. This effect is visible in all the above photos. It is not a bug in my PLD designs. If I had a NTSC Apple IIe, the colors would be correct.

 

Here is a photo of the hardware:

 

 

The ICs in the green box are the IOU substitute, and the ICs in the red boxes are the MMU substitute. Note the now empty sockets for the original MMU and IOU. The PLD based substitutes are wired 1:1 to these now empty DIL-40 sockets. If the substitute ICs for one custom chip all are pulled out of their sockets, and the original custom IC made by Apple is plugged in, the slot card readily reverts to a 'before' state. This quick change technique is helpful if during development questions pop up which are best answered with the real custom ICs.

You can see I that I ran put of PCB real estate, so I have no speaker output and no keyboard autorepeat yet, and was not able to implement all these slow moving functions with counters (would have taken another 22V10, to get 10 macrocells more). Instead, I opted for an 'analog' implementation of these functions, based on RC timing delays and a 74HCT132 (quad NAND Schmitt triggers). I just did not put the electrolytics for that in, because they would interfere with the oscilloscope probing of all the IC pins on all the ICs. Which is work I still need to do (the purpose is to look if poor quality signals are hiding somewhere in the PCB, this step cannot be skipped).

 

This is the state of the work after only 3 days of work from wire wrap start !

 

I started and finished the wire wrap on Thanksgiving day and then tried to make it work over the weekend. From simulations of the RTL I expected it to work on first power up, but that was a fallacy. I had cut corners (to save time, hahaha, got the opposite effect) and so, 'saving time', I did not set up a full timing simulation using my proprietary PLD tools. As it turned out this was a mistake.

 

The 1st bug seen: vertical counter not counting !

 

The penalty for cutting corners (aka "saving time", which it didn't) was that nothing worked at first. There was no video signal. The vertical counter (in the EP610) did not count. But the same RTL has been copied from the "compare" unit in the EP910 (see STEP 8, post #10 above in this thread), where it did count and compare perfectly. As it turned out, believe it or not, the stuck counter was caused by some flipflops in it having no clock !

 

A nasty bug in the "battle proven" PLD design software was discovered !

 

It turned out that the 1980s era PLD design software I used for this work has a bug in its EP610 fusing model. And this bug manifests itself when writing "tidy" RTL and giving it an equation for the output enable product term like:

  

       pinname.OE = 1;

 

This is a habit of mine because some PLD design software suites (like early versions of PALASM) requires this, otherwise, the tristate driver on the output pin stays in 'high impedance' and nothing works. So it is a recommended practice to write down these equations to enable the drivers. Other PLD design software suites (like ABEL) are 'smart' so they activate the output enable product terms automatically, once a logic equation is written for a pin. The software sees this and adds the equation for the OE term automatically. This then is an 'implied'  feature of the design, which is bad, because it makes the RTL dubious when migration to other tools happens later.

 

The software bug turns the macrocell configuation for this pin into a 'Product Term' clocked flipflop, and the product term is constant "1" as demanded by the equation meant for the output enable. Alas, it does not go to the OE node, but to the CLK input of the flipflop in the macrocell. Consequently, the flipflop never flips or flops. This cost me 1 1/2 days to hunt down. It would have been so easy to avoid. After I had exhausted all the options in the lab, like tying gating pins to constants etc., I finally ran the JEDEC file of that EP610 through my proprietary software:

 

The test vector generator reported only 128 states reachable. Which is just the 5 bit horizontal counter plus 2 state bits in a control state machine which does the blanking and the sync periods. 7 bits, 128 states. No trace of the 256 additional states in the vertical counter (because it was stuck). This could have been an early warning, had I sicced my tools on the JEDEC earlier, but cutting corners I never ran these tools before I had exhausted all the other options to find the root cause. After I knew what was going on, a quick 'grep CK' on the report file of my tools indeed showed that none of the flipflops of the vertical counter having that pinname.OE = 1 equation had any clock. Those other macrocells where the OE equation was an active term with variables (to drive the RA[7:0] multiplexed address bus of the DRAMs at a certain point in time) had a clock. But being in the middle of the vertical counter, they never counted, too. Oh man. What a mess. Discovered a nasty bug in that PLD software after 35 years. Well, I did not use any EP610 ever before. The EP910 always was my favourite of the Altera EPLDs. I will soon move the RTL to another CAD suite anyways, where I can play with larger CPLDs (Lattice ispLSI family). Maybe I can shoehorn both the MMU and the IOU into one of them. Some of the ispLSI 'family' members are 5V powered devices.

 

After the fix - (almost) a full success !

 

After the fix was in (removing all the 'pinname.OE = 1' equations in the EP610 source file), the vertical counter started the count, and a video signal appeared on the oscilloscope for the first time with this IOU substitute. I then took the Apple IIe from the lab down to the living room where the CRT TV is, and turned it on. Alas, there was yet another bug:

 

Instead of the expected text screen, I got a LORES graphics screen with colorful blocks at the places where characters should be. This is obviously a fault in the SEG[A:C] or RA[10:9] signals for the video ROM, which needs further investigation. I just had copied the logic for these signals from Jim Sather's book and hence, expected no trouble, and did not simulate them at all. Maybe I have overlooked a small inversion bubble somewhere.  Coincidentally, this Apple IIe owner who initiated the following thread has seen the same fault (LORES instead of TEXT) and blames the (original) MMU in his Apple IIe for possibly being defective:

 

https://www.applefritter.com/content/mmu-problem

 

I did not take any screen shot from my version of the 'LORES instead of TEXT' problem, as it looks exactly the same as the one in said thread. So it might be that this "MMU" problem actually is an IOU problem, because some SEG[A:C] or RA[10:9] output from the IOU turned bad. Just saying ! But I digress. Back to my own work.

 

The good, the bad, and the ugly !

 

HIRES graphics and DBLHIRES graphics do work, as verified by running "WINGS OF FURY" which I think is the best test for a 128k Apple IIe, other than the various 'diagnostic' floppy disks from which I don't have any. "WINGS OF FURY"  exercises both the MMU and the IOU quite extensively and does a lot of memory bank switching.

 

But there is still an imperfection (see yellow oval in 2nd photo above): something is wrong with the blanking signal generation (meant is the !WNDW signal, original IOU pin #38). Seems it gets turned on too soon, and so some trash bytes appear prior to the "official" left edge of the visible / live screen area.

 

Conclusion and outlook.

 

So overall, two days into the debugging phase, the IOU substitute looks good, except for the mentioned two bugs (LORES in lieu of TEXT, and the trash bytes), which I hope to hunt down and kill in the next few days.

 

So far the intermediate report of my progress with my PLD based MMU and IOU substitutes.

 

Stay tuned !

 

- Uncle Bernie

Offline
Last seen: 21 hours 4 min ago
Joined: Mar 10 2023 - 21:36
Posts: 56
/WNDW, timing and LORES

I had the same problems in my implementation with /WNDW being too early. In the schematics, this signal is gated by P_PHI_2 (IOU schematics page 1, square C-2, component M-8). I think back then electronics were much slower that today and they needed to rise /WNDW earlier to compensate (P_PHI_2 is HIGH for one 14M clk before and one 14M clock after a PHI_0 rising edge, if I remember correctly).

In my implementation, I gated /WNDW to the rising edge of PHI_0 and with this, had the same timings as the official IOU. It's unfortunate that the Apple II series rely so much on async designs.

 

As for the LORES problem, I'm very curious to know what it could be. As I wrote elsewhere, I only had this when I forgot to place a pull-up resistor on /RESET.

 

I think you are very close to being finished.

Keep us posted with your progress!

 

Now that I think of it, I can do a quick update on my implementation as well: We have some issues with the production of the release candidate units. Nothing bad, just more delays. Once we have these units, we'll carefully re-test everything and then the retail units will be produced and released. I hoped this would happen in time for christmas, but I looks more realistic early next year.

Offline
Last seen: 3 days 13 hours ago
Joined: Apr 1 2020 - 16:46
Posts: 875
More progress with the PLD based IOU substitute !

In post #15, 'frozen signal' wrote:

 

 " I think back then electronics were much slower that today and they needed to rise /WNDW earlier to compensate (P_PHI_2 is HIGH for one 14M clk before and one 14M clock after a PHI_0 rising edge, if I remember correctly). "

 

Uncle Bernie answers:

 

You are correct, back in the day, digital ICs were slower, and so they used tricks to compensate for that. Alas, when PLDs (and other digital ICs) got faster and faster, this wrecked a lot of industrial designs which had been in trouble free production for years. By sheer coincidence, I wrote up something about this topic for this post on my progress (I prepare these at home, and then drive to the public library, where there is free WiFi. But it's not a matter of money, I could afford internet at home, but what I don't want is the lack of anonymity with that).

 

Thank you very much for the hint with the WNDW signal timing. As you can see below, there indeed was a bug in my RTL which caused the WNDW signal to be one clock cycle too early. This came about due to the partitioning of the design into several PLDs, for which I had to add some simple state machines which reconstruct internal states of another PLD in the PLD where they need to be known - this need of system partitioning and the lack of sufficient pin numbers  is the downside of using PLDs for this kind of work. But it has upsides, too, such as the immediate access to every inner node for the oscilloscope. No 'buried' signals !

 

I think your hint will save me the time to unplug all the PLDs in the IOU substitute to take measurements of the WNDW timing of the real IOU.

Thank you very much again as I always appreciate if somebody saves me some of my time.

 

Here is what has transpired since my last post:

 

The 'LORES instead of TEXT' bug had a ludicrous root cause. It took me 5 minutes to analyze and 5 more minutes to fix it.

 

Here is how it was done: a quick probing of the state of the IOU soft switches with the oscilloscope showed that the SS_TEXT soft switch was not set after power up, which the firmware attempts to do (as long  as a keyboard is connected). This probing was possible because all the PLDs I have chosen for this first iteration of the design have no buried macrocells. Buried macrocells are bad for debugging, because they can't be probed. This is why I avoid so-called CPLDs having buried macrocells, as long as avoiding them is possible. Once the design is fully debugged and probed, it does not take too long to migrate the RTL to a larger CPLD where all the functions not needing an output pin can go into buried macrocells, and this is where buried macrocells have a great advantage. But for debugging, no. FPGAs are even worse - if you need to observe an inner node, you need to change the RTL to route that node out to a pin. This involves another logic synthesis and fitter run and consequently, the circuit in the modified FPGA is not the same anymore as before. This can make timing related problems go away, just to make them reappear when the observability mod is taken out again. And these modification spins take a lot of time, too.

 

FIXING THE TEXT BUG

 

So we have learned that the SS_TEXT soft switch could not be flicked into the "ON" position. What was the root cause or that ?

 

You will never guess it !

 

It was no design bug at all. I had simulated this small mod to the RTL very thoroughly because it involves many moving parts: the A0 address bit must be grabbed from the multiplexed address bus (like in the real IOU, not enough pins available on the IC package). Then the EP910 receives two signals, SSEL[1:0] which specifies which of the three soft switches within the EP910 must be flicked, or if they stay the same. The SSEL[1:0] signals come from the PLD (a 20V8 emulating a P20R4) which holds the other IOU related soft switches and their readback. And this one gets a clue from the address predecoder/encoder PAL in the MMU section. Which had to be updated to encode the token for SS_TEXT and SS_MIX, too. So, many moving parts, and the RTL was updated, synthesized, and simulated throughly, and it worked just fine in simulation.

 

So what went wrong ? Why did the SS_TEXT soft switch not flick "on" ?

 

A timing bug not seen in the simulation ?

 

The actual root cause is quite ludicrous. I just had forgotten to reprogram that predecoder GAL in the MMU section with the most recent JEDEC file. So it still contained the previous version which of course did not decode/encode the addresses for SS_TEXT and SS_MIX yet. I reprogrammed this GAL and then my Apple IIe worked, with text mode, mixed screen mode, etc.

 

This forgetfulness is sloppy. There should be a checklist which lists all the PLDs that have been updated with a new revision of their JEDEC, and a check box to see if they have been reprogrammed / replaced in the prototype, even for hobby projects. The industry uses such methods to keep track of revisions of any module in a design. The lesson learned is that hobbyists should do that, too.

 

(But this mishap of mine was not as bad as what did happen to another team in a company I once worked for: they had a few minor bugs in the 1st silicon of their IC, and proceeded to do a 2nd fab spin without the bugs, based on a few minor mask revisions ('poly up'). When the 2nd silicon arrived, it had the same bugs as the 1st silicon. The IC was put under an electron microscope to see what had happened. Oooops - instead of the 2nd, bug fixed version, they had sent the now obsolete 1st silicon GDS file to mask fabrication again, and consequently, their 2nd silicon was the same as the 1st silicon, with the same bugs. Four months and the added mask and fab costs lost for nothing. In this context, my mistake with not updating that GAL, which cost me 5 min to fix, looks not as bad ! But I was lucky enough to remember that this GAL in the MMU had been changed. Otherwise, the search for the root cause would have been taken much longer.

 

Here is a screenshot from the first time I saw a text screen made by the IOU substitute:

 

         

 

It still has some faults: there are trash characters appearing before the start of the  'official' character field (yellow oval). And there is yet another issue:

 

Now, having a text screen and being able to type in keystrokes, I was able to show that the last character of a text line also was missing. This is the character 'E' visible in the above photo. It should have been located at the end of that text line, but instead, it appeared further down the screen, in the column just before the 'official' start of the text lines (red circle).

 

So the diagnosis is simple: the WNDW signal comes one video byte cycle too early and ends one video byte cycle too early. A delay by one video byte cycle would fix it. Or so I thought.

 

FIRST ATTEMPT TO FIX THE TRASH BYTE BUG

 

So in order to add the one cycle delay, the equation of !WNDW was changed from this:

 

        !WNDW := V4 & V3                                     " vertical blank period

                          # (hst_state != HST_LIVE);    " horizontal blank period

 

 to this:

 

        !WNDW :=  V4 & V3                                                     " vertical blank period

                         # (hst_state == HST_HBLO)                    " begin of horizontal blank period

                         # !WNDW & (hst_state != HST_LIVE)    " keep !WNDW = 1 until ...

                         # !WNDW & !H0;                                         " ... 2nd byte in live scan.

 

The 'hst_state' is from a two flipflop / four states  state machine in the EP610 which conveys the information about where the horizontal counter stands (LIVE, LOAD, HBLANK ONLY, HBLANK WITH HSYNC). This is all the information the EP910 gets to know where the electron beam is on the screen.

 

After this fix was applied, the text screen looked good:

 

 

You can see that all the characters appear were they should be, the line rolls over correctly, without losing any character in the 'ABCD...' sequence I typed in.

 

The test with "WINGS OF FURY" followed:

 

 

Alas, there still are a few small lingering pixels before the 'official' start of the active screen (yellow oval). Compared to what I had before (see 2nd photo of post #14 above in this thread), some progress has been made.

 

But it seems that the WNDW signal still turns on the video ROM a little bit too soon. How can that be ? It is clocked with the same RASRISE1 signal as seen in Jim Sather's book. Seems that in my IOU substitute, the whole signal path for RASRISE1 to WNDW is a little bit faster than what the Apple IIe motherboard expects, and consequently, a few spurious pixels still appear which shouldn't be there.

 

This is the sort of problem where the adapter card concept shines: I just have to pull all the IOU PLDs and plug the original IOU in, to measure with an oscilloscope where exactly its WNDW signal gets asserted and desasserted. Then I can try to duplicate the same timing in the PLD substitute.

 

(Thanks to the comment in post #15 of 'frozen signal', I was spared to do this IC pulling and the measurements. Here are some thoughts of mine on 'adding delay' which may be of interest for 'frozen signal', too --- "It's a trap !").

 

Alas, the chances with PLDs to add a robust delay are slim. It ought to be done with a clock edge, because combinatorial delays are going away gradually with the PLDs and CPLDs getting faster and faster with increasing year of manufacture. This is an inevitable consequence of their manufacturers migrating the PLDs down the CMOS shrink path, to get smaller die sizes, and lower costs, allowing for more aggressive pricing. The fact that this also makes the PLDs faster (in the 1980s, -35 and -25 parts were common, and in the 1990s, -10 and -7 became available, meaning 7ns propagation delay through the whole PLD, from input to output).

And this broke many industrial designs which had been in production since many years, especially when these designs had partitioned larger state machines into several PLD packages, or had state machines in different packages talk to each other, while being driven from the same clock net. This works fine with slower PLDs but the faster they get, the assumption of having a fully synchronous system goes out the window, and nothing works anymore. (Fully synchronous systems are an   i d e a l   which works on paper, but does not exist in the real world. Real world effects turn  a l l  allegedly 'synchronous' systems   into        p l e s i o c h r o n o u s   systems and when the combinatorial logic between the flipflops gets fast enough, the whole theory on which synchronous state machine design is based falls apart (this means, it falls apart in the real world, where the "rubber meets the road", but it does not fall apart on paper, the theory of the  i d e a l  synchronous state machine as such is still valid. But what is it good for when it doesn't work anymore in the real world ??? A lot of tricks had to be added to the CAD software which the semiconductor industry uses to design digital blocks in the full custom ICs up to this day ... but it's a mess of kludges which is reaching its limits.  Hence, renewed academic interest in asynchronous / self timed systems. None of which are truly ripe yet for general adoption as a new digital design paradigm.)

 

This is yet another reason to work with the slowest ICs you can find which still are fast enough for the job at hand. "The greed for speed" is not only a killer on the road and in the air, but also has killed many promising systems designs.

 

This is the current state of my work on the IOU substitute. Hope I can eliminate the last quirks the next days.

 

Comments invited !

 

- Uncle Bernie

Offline
Last seen: 21 hours 4 min ago
Joined: Mar 10 2023 - 21:36
Posts: 56
Black Magic will show the issue better (I hope)

By looking at your screenshot, I think you still have a timing issue with /WNDW. No need to plug your logic analyser; just run Black Magic. This is what was displayed before I fixed my /WNDW problem:

Look to the left of the nose and mouth; this is what should be displayed right of the bird (the white thing next to the nose is part of the wing and the brown thing below is part of the path).

 

I also found that the game ALIENS was very good at showing other timing problems. It's a double hires game and timing problems seems to impact this game more than the others. It have been a very useful tool to me.

 

But at first glance, I'd say it's just /WNDW still needing a slight adjustment.

Offline
Last seen: 3 days 13 hours ago
Joined: Apr 1 2020 - 16:46
Posts: 875
2nd round of fixing WNDW !

In post #17, frozen signal wrote:

 

" But at first glance, I'd say it's just /WNDW still needing a slight adjustment. "

 

Uncle Bernie comments:

 

You are right. This WNDW signal is the most rotten part in my design. The problem I have with it is that I can't just copy the equations in Jim Sather's book for HBL and VBL (which I suppose are correct, hopefully). Due to the partitioning of the logic over different PLDs, I had to make compromises and had to deviate a lot from the way Apple does it (as per Jim Sather's analysis). It would be so easy if I had all the horizontal and vertical counter bits available in the EP910, but the only counter bit that's there is H0. Everything else must be derived from the HST_STATE and the V[4:3] vertical counter bits, which are fed to the EP910 verbatim from the EP610. This complicates matters, and it is easy to make mistakes with WNDW.

 

Here is what transpired this morning:

 

Being too lazy to move the oscilloscope down from the lab (it is hooked up to some other experiment which I don't want to disturb and then set up again), I use my "1 cent oscilloscope" to make the WNDW signal visible.

 

This is how it works:

 

in any computer generating video, you can hook up a resistor (costing 1 cent) to the video summer, and the other leg of the resistor is the probe tip of your ultra cheap ( 1 cent) oscilloscope.

 

By probing signals in the circuit, you can now see the relationship of that signal to the video signal. Which, for debugging the video generation section (in the Apple-1 called the "Terminal Section") is just fine enough to "see" issues.

 

Here is how the 1 cent oscilloscope looks:

 

 

I hooked up a 20 kOhm resistor to the video summing node (base of transistor Q1 in the Apple IIe). And with the other end I can go probing various signals in the circuit. But in this case, it's the !WNDW signal on pin #38 of the original IOU (empty socket in this case).

 

What happens is that when the !WNDW signal goes low, meaning the active character field, the video signal is being pulled to a "blacker than black" level. This is also the reason why the resistor can't be made too low ohmic. If the video signal goes too low, the TV will interpret this as a SYNC, and the picture will fall apart. But with the right resistor value, all is fine. Here is how the text screen looks now:

 

 

You can see where the WNDW signal turns a black video signal "blacker than black" (this is a term of video tech insiders - correct adjustment of brightness of a TV means that even in "normal" black areas, there is just a faint excitation of the phosphor. This is called "black". "Blacker than black" means that the electron beam is fully turned off. Which, in normal operation, should be avoided.

 

Ignore the color spectrum - this is an ill effect from hooking a PAL Apple IIe to a NTSC TV set (my IOU substitute of course makes a NTSC signal as far as the timing is concerned, but the PAL Apple IIe does not cooperate). Important to note is the magenta vertical bar at the left edge of the WNDW and the blue vertical bar at the right edge of WNDW. These again are color artifacts but they convey information where exactly the WNDW signal edge lies within the NTSC color circle. The stickers with the marks on them also capture where these bars are (they have a parallax  due to the camera position, in the real world they are exactly on the color bars).

 

After the mod (clocking WNDW with rising edge of PHI0 instead of rising edge of !RAS in PHI0 = 0, as suggested by frozen_signal) the text screen looks like this:

 

 

You can see that the magenta vertical bar has changed to blue. This corresponds to one M14 clock added delay. The same effect is exploited by the PHASE bit in Apple II HIRES color graphics. The blue vertical color bar which was on the right hand side in the previous photo has disappeared. Camera position was about the same, so we can conclude from the marks on the paper label that WNDW now changes a bit later, by the width the blue color bar had. This is a rough estimate, though, because the NTSC/PAL converter in the PAL Apple IIe manipulates the signal somewhat. To compensate, I typed in some characters. Now it is evident that WNDW turns off in the midst of the 2nd last character of a screen line.  This means that at this point, 1.5 characters are still in the video pipeline. Which, I think, is about right because of the stages in the video pipeline (more on this below).

 

The spurious pixels in the HIRES mode are gone now, too:

 

 

Alas, in this "WINGS OF FURY" screenshot I can see that the last HIRES screen byte is blanked out too soon (see yellow circle).

Sigh. Yet another bug in the WNDW signal !

 

If you look at the equations in post #16 above, you can see why: the V3 & V4 product term fires as soon as the vertical counter is incremented past the last HIRES screen line, but in the Apple II, this happens while the last HIRES byte is still in the video pipeline.

 

 

So I have to change that equation again, to fix this issue.

 

BTW, this is a sort of bug I can't easily see in the simulations because the exact timing in the video pipeline is not being simulated. Of course, said timing could be found out, but who has the time for such investigations. A quick glance at the Apple IIe schematics shows that the video byte read from the screen buffer first goes into a pipeline register (74LS374, UE7) and then through the video ROM (UE9) and then gets loaded into the 74166 shift register UC13, which is clocked with 14M. How fast it shifts and when exactly it is loaded depends on the signals from the timing HAL, UD11.

 

Jim Sather's book explains it all. He has nice timing diagrams on all of that. Which I of course did ignore. Because in the EP910 I don't have none of the scan counter bits (except H0) available. So I can't use his equations and still need to somehow derive the WNDW signal from whatever is available within the EP910, which is not much. But I think in the next iteration I will finally get the WNDW timing right. 

 

Stay tuned !

 

- Uncle Bernie

Offline
Last seen: 3 days 13 hours ago
Joined: Apr 1 2020 - 16:46
Posts: 875
More progress with the IOU substitute !

Some progress has been made since my last post above.

 

The (hopefully) 'final fix' to the WNDW signal timing problem took me 5 minutes (including booting up the computer with the PLD design software) and then 5 minutes more to program another EP910 and plug it into the adapter card (this 10 minutes spin around time is typical for a PLD design iteration fixing a trivial bug, but full custom ICs are 3-9 months per 'spin', so these indeed need much, much more scrutiny in the simulations, keep that in mind). Bingo, the last HIRES byte now is visible !

 

The fix was simple, here is the updated WNDW equation:

 

!WNDW := (hst_state == HST_HBLO)         " begin horizontal blank period

       # !WNDW & (hst_state != HST_LIVE) " keep !WNDW=1 (blanking) until...

       # !WNDW & H0                      " ... at 2nd byte in live scan.

       # !WNDW & V4 & V3;  " keep blanking active in vertical blank period

 

 

Compare that to the last WNDW equation in post #16. The vertical blanking period V4 & V3 now is not able to directly set WNDW = 1 anymore (which blanked out the last HIRES byte). Instead, in the new equation, the vertical blanking period V4 & V3 will only extend  a horizontal blank as long as V4 & V3 == 1. The polarity of H0 also had to be modified, as its clock phase was changed as per post #18).

 

Now, all the blanking on / off timing is controlled by the horizontal blanking only, which had been fixed in the previous equation. Since the vertical counter changes state just before the end of a scan line, which is brought about by the horizontal SYNC, the V4 & V3 change only happens outside the "live" video area. Problem solved.

 

ABOUT SIMULATION DEPTH ("WHAT TO SIMULATE OR WHAT NOT TO SIMULATE ?" --- adapted from Shakespeare)

 

These are the little teething problems when designing a substitute IC based on insufficient information about the target system. I always had a hunch that the WNDW signal may be a troublemaker, and would need some (small) design iterations to get it right.

 

The whole design is a bit more complex that you might think. My horizontal counter does not count 0, 0, 1, 2, 3, ... as the horizontal counter in the Apple II or IIe does. Instead, it counts such that it directly generates the row addresses without the 4-bit adder used in the Apple II and the Apple IIe IOU.

 

This is because PLDs are terribly inefficient when an attempt is made to synthesize an adder. The LSB of the sum alone needs 4 product terms (3 bits in, 8 possible bit patterns, 4 of which yield a '1' output). Since there is no carry chain, the number of product terms doubles with each sum bit. The MSB of a 4 bit sum has 64 product terms. There is no PLD having that many product terms per macrocell.

 

One approach to mitigate the adder problem is to build a ripple carry chain, so for each sum bit using one macrocell, they would be a carry out bit in another macrocell. Both need four product terms each. This is a nice fit for some CPLDs like the AMD MACH family which offer 4 product terms per macrocell without invoking product term sharing between mactrocells. But to make such a 4 bit adder in a 'classic' PLD, it would consume a whole 20L8. This is a waste. Worse, the older logic synthesis tools don't understand how to do carry chains for adders. So it would have been necessary to code all the 32 equations by hand. Too inefficient.

 

So I decided to let the horizontal counter count exactly as needed to generate the awkward scan address sequence of the Apple II. This saves one PLD package, and saves a lot of RTL coding work. The downside is that none of the 'known' equations for the horizontal timing - as seen, for instance, in Jim Sather's book - can be applied. This complication then led to some wrangling of the WNDW equation being needed until it was right.

This was caused by intentionally avoiding to simulate the whole video pipeline on the Apple IIe motherboard, which would have taken far too much time to code, not having a model for the timing HAL. So I decided to only simulate that the HST_STATE machine in the EP610 produced the LIVE, LOAD, HBLANK, and HSYNC states at the right point in time, and I then had hoped to be able to fix, if necessary, whatever trivial equations would come after that.

 

Did this deliberate gap in the simulation save me time or cost me time ?

 

I think it saved me a lot of time despite of the two iterations of the WNDW equation. Reason is, I did not tackle the substitution of the timing HAL yet, and without an exact RTL model for it, it is all but impossible to accurately simulate the rest of the video pipeline on the Apple IIe motherboard, and an imperfect, cruder simulation thereof would lead to problems with getting WNDW right the first time anyways. With a crude, imperfect model, you could only simulate that your design works with that crude kludge, which most most likely is wrong, and hence, the simulation thereof is worthless.

 

In other words, if you know you can't build a truly accurate model of some 'missing pieces' outside of your design, it's probably not worth to spend time on that. Instead, define a clean interface to that outside world which can be fully simulated on a higher abstraction level, and plan in ressources to 'adjust' details of the actual signals going out later (this is why we always built PLD substitutes to exercise them in the target system, before designing the full custom or semicustom IC).

 

This is the type of decisions about the depth of simulations needed. In this case, I decided that as long as it is an easy fix (20 minutes in total to fix WNDW was spent, proving that assumption right) it's not worth to invest many days of work to reverse engineer the timing HAL and to model the video pipeline down to the pixel bit stream coming out of the 74166 shift register. It could be done, though. But I did not want to spend so much time on that topic.

 

LACK OF USEFUL TIMING HAL EQUATIONS

 

I could not find any usable equations for the timing HAL on the web. Those floating around pretending to have been leaked from Apple seem to be an earlier version, and the equations in Jim Sather's book are written in hard to decipher, non-canonical format, so my attempt to plug them into a PLD design tool did not yield a functional HAL --- seems I need to reverse engineer the HAL, too. For which I don't have the time right now.

 

CURRENT STATE OF THE PROJECT

 

As far as the MMU and IOU substitutes are concerned, I think that they are almost done. But lacking a NTSC Apple IIe, I can't verify that the colors in all graphics modes are right. There are several IOU outputs which control the colors in both LORES and HIRES modes (the IOU knows nothing about DBLHIRES, this happens elsewhere on the Apple IIe motherboard, and the timing HAL is the main actor for this).

 

So I decided to put his project on the back burner, until the opportunity arises to get a NTSC Apple IIe (or a motherboard) at a good price. (No, I'm not going to pay the moon prices asked on Ebay for these Apple IIe items --- they are NOT that rare, as millions of Apple IIe have been made).

 

But I am close enough to the finish line that I can do the math on how many TTL packages are absorbed by each PLD. I still need to think a bit about how that can be standardized in a meaningful way (I'm using some Altera EPLDs which ain't no 'classic' MMI PLDs). One approach could be to count macrocells and divide by 8 to arrive at an estimate on how many 'classic' MMI PLDs ('N') are the equivalent of one EP910 (N = 3 ?) or EP610 (N = 2 ?). This is a little bit hand-waving because input pin count also matters, and it impacts the partitioning of the logic into several PLD packages. If a smaller PLD lacks input pins for a given function, macrocells must be sacrificed by repurposing them as inputs. So the above proposition may not be realistic enough. But it still could be acceptable. I need to do a few pencil-on-paper exercises on how the IOU and MMU logic could be put into smaller 'classic' PLDs to arrive at a sound conclusion.

 

Stay tuned !

 

- Uncle Bernie

Offline
Last seen: 3 days 13 hours ago
Joined: Apr 1 2020 - 16:46
Posts: 875
Some statistics on PLDs vs. TTL packages in this project

Here are some numbers to get a handle on the complexity of the MMU and IOU design (in terms of hardware 'units' needed).

 

Statistics of the TTL based IOU/MMU substitute

 

On the photo of Apple's wire wrapped Apple IIe prototype (the one with the two PCBs), I have counted 99 TTL packages in the left hand side PCB, which contains the TTL substitutes for the MMU and IOU custom ICs. The DIL-8 was not counted (a 555 Timer perhaps ?) and those IC sockets which looked empty or occupied by a DIL carrier for discrete components were excluded. Alas, due to the poor quality of the photo I can't be 100% sure that the "99 TTL" number is correct. But 1-2 TTL give or take doesn't matter here. Here is the photo:

 

The "TTL graveyard" in the yellow box is the TTL based substitute for the IOU and MMU which Apple built to verify the design before the custom ICs were made. Note that the TTL based substitute plugs into the two DIL-40 sockets for the IOU and MMU of the actual Apple IIe motherboard to the right hand side. One of the flat band cables is hidden below the other. So once the custom ICs became available, all they had to do was to unplug these flat band cables and put the custom IC prototypes in the sockets. The left hand side PCB with the "TTL graveyard" would be separated and the true Apple IIe prototype was left over. (I'd love to see a photo of that, if you now where it can be found, tell me). This method to substitute custom ICs with TTLs was standard practice in the industry until the GALs came along in the mid 1980s. Then everybody would use a small "GAL graveyard" instead of a larger "TTL graveyard". The advantage of the GALs was instant reprogrammability to fix bugs / do changes. Bipolar PLDs were not liked for the purpose of debugging such prototypes, because each "spin" would land one such PLD in the trash can, and they were not cheap per blank PAL, and  had high NRE costs if a programming system and design software had to be bought. So there was reluctance  in the computer design community to adopt new ways. For the Apple IIe prototyping, PALs were probably too new on the market, so they still used the 'old ways'. The prototype of the Amiga computer designed around the same time also was based on many large "TTL graveyards" distributed over several large PCBs. But when the EEPROM based GALs came out in 1985, PLDs really took off, and the industry started to build functional prototypes for custom ICs from PLDs. So what are the statisics for them ?

 

Statistics of Uncle Bernie's  PLD based  IOU/MMU substitute

 

My PLD based substitute comprises 5 x TTLs, 4 x type 20V8/22V10 GALs, 2 x EP910, 1 x EP610. The fact that I did the autorepeat with a 74HCT132 in lieu of counters should not distract, because all the necessary additional counter bits are available in a single TTL package, too (i.e. 74LS393). So it's a wash when it comes to the autorepeat implementation with pure digital or with analog means.

 

Using "Small Classic PLD Equivalents", or "SCPEQ", which means 20L8. 20R4, 20R6, 20R8 types, each having eight macrocell equivalents, we arrive at the following numbers:

 

EP910, 24 macrocells = 3 SCPEQ each

EP610, 16 macrocells = 2 SCPEQ each

20V8/22V10, 8 macrocells = 1 SCPEQ each

 

 

The 22V10 of course has 10 macrocells, but in this design, I used less than the 8 it has. Actually, it was a 20R4 before I had to do a small fix which required one input pin more. All the 16R4,6,8 and 20R4,6,8 waste one input pin if the clock also is needed as a combinatorial input. The 22V10 does not have that drawback, as its pin #1 always drives a column pair in the array, and it clocks all the flipflops at the same time. So by switching the 20R4 to the 22V10, I was able to gain the one uncommitted input pin I needed for the fix. The logic as such would easily fit into a 20R4, so it's fair to assume 22V10 = 1 SCPEQ in this design.

 

Now, the total number of SCPEQ is: 2 x 3 + 2 + 4 =  12.

 

Which means the logic would fit into a dozen of these small classic PLDs. Not bad.

 

They replace 99 - 5 = 94 TTLs, the latter being the TTLs used in my design.

 

So each of these small classic PLDs 'swallows' 94 / 12 = 7.83 TTLs - Let's be generous and round up to "8".

 

In other words, in this example, 8 TTL packages were absorbed by each small classic PLD.

 

Not bad.

 

Not bad at all.

 

(But I must admit it's a bit hilarious to do such a study in 40+ year old PLDs in the Y2023 where IC may contain billions of MOSFETs each).

 

The number eight is a good rule of thumb. I have seen PLD based redesigns of industrial TTL "graveyards" which arrived at 'figure of merits' between 6 and 10.

 

I do have lots of old databooks on PLDs, including the very first Y1978 one by MMI, now a collectible all by itself, where they reveiled their brand new PLDs to the public. Here is a scan of the cover:

 

 

At that time, they only had 15 PAL family members, all in 20 pin packages, beginning with the PAL10H8 up to the PAL16L8, PAL16R4, PAL16R6 and PAL16R8. The 24 pin versions with 4 more inputs came later.

 

On page 1-10 of said data book they claim "Reduces chip count by 4 to 1". So, figure of merit = 4.

 

Which, as we can see, is a very conservative estimate. In case of the IOU/MMU substitute, a better figure of merit of 8 was achieved. But these devices had more pins. If the same logic would be reimplemented with the 20 pin PLDs, then a few more would be needed, due to pin per package constraints. It has been my experience that when partitioning logic into PLDs, it's not the product term number limitations which stop  the process of putting more and more logic into a given PAL, but instead, it's the limited number of input pins. Obviously, this is the reason why MMI added the 24 pin PAL family soon after the launch of the PALs. These did not increase the number of outputs (think: fixed architecture "macrocells", one per output pin) nor did they increase the number of product terms. They just added 4 input pins more, which means that the fuse matrix got 8 columns more.

 

How does my 'figure of merit' compare to the marketing claims of the PLD manufacturers ?

 

Here is why I was motivated to do this "figure of merit" study at all. Despite I love PALs (and GALs, even more !) I always had the lingering memory that MMI (and other PLD vendors who jumped on the bandwagon) somehow had greatly exaggerated the package count reduction ratio (the "figure of merit"). Just this morning, after 40+ years suspecting them to have greatly exaggerated (and hence, deceived me) I found out why this thought occured to me and was lingering in my mind for all these decades. I finally  found the culprit !

 

Look at the photo on the MMI 1978 "PAL HANDBOOK" above in this post. Did you notice that the photo shows ~ a dozen ICs on the PCB inside the big "PAL" ? This is the number, 12, I had remembered all the time, and I knew this number is wrong, because I never ever saw this package count reduction ratio in any PAL (or GAL) design effort. (Actually, if you look carefully, it is 13 (!) ICs and one electrolytic capacitor which can be spotted in the photo. 13 is the number of bad luck ! And bad luck they had - they could not manufacture the PALs, as their yield was terribly low. This nearly killed MMI. And the PALs. But this is another story. The yield problems were sorted out and the PALs became a tremendous success.

 

As humble as these numbers appear from our Y2023 standpoint, the PAL was a revolutionary device. Digital logic design never was the same after PALs became available. And the progress with them was stunning. Within a decade or so, we had complex PLDs ("CPLDs") having 128 or 256 or even more flipflops. These can contain all the logic in the Apple IIe in one package. All you need to add is the 6502, the firmware ROM, and a few DRAMs. The 64k x 4 type being particularly attractive. Four of them are enough to get the 128 kBytes seen in the Apple IIe and Apple IIc.

 

So theoretically, in the early 1990s, hobbyists could have reimplemented the Apple IIe with just seven (!) off the shelf IC packages, no 'custom chips' involved. But AFAIK, that never did happen.

 

Comments invited !

 

- Uncle Bernie

 

P.S.: as a footnote, PALs were not the first commercially available PLDs. In 1975, three years earlier than the launch of the MMI PAL, Signetics had launched their FPLF ("Field Programmable Logic Family"), but alas, having both programmable AND arrays and programmable OR arrays, they were difficult to use, and they were slower than typical TTL logic implementing the same function. Whereas PALs had the same or higher speed than TTL when implementing the same function. And this was the true innovation in the PALs - they had no programmable OR array. Which makes the die area smaller and the device faster. And this was critical for success in the marketplace. "Speed" always has been a key performance criteria for digital ICs, and any programmable logic family who would be slower than TTL was not competitive in applications needing speed. I still remember how difficult it was in the 1970s and early 1980s to work with MOS based gate arrays. They were just too slow. So the use of NMOS based custom ICs in a mainstream computer system was a bold move in the early 1980s. Apple did it in 1982 with the IOU and MMU. Atari had done it in 1978 with the ANTIC, CTIA and POKEY custom ICs used in the Atari 400/800. The (in)famous Sinclair ZX81 used a Ferrati "ULA" semicustom IC which is based on a bipolar process, and was obsolete technology even back then. So it doesn't compete here. Typical for the genius of Clive Sinclair, they had squeezed more functionality in than Ferranti ever expected (years earlier, Sinclair also had squeezed a scientific calculator into a 4-bit TI calculator IC out of which TI's own programmers never were able to get out more than a humble four function calculator).  Consequently, the ULA ran too hot (bipolar ! ~50% more utilisation / power consumption than Ferranti ever expected ! Plastic package !) and so all too many ZX-81 ULAs died. So, the ZX-81 fans crowd produced an ULA substitute, which costs more than the whole ZX-81. Do you see the similarities and coincidences now ? What a funny world of vintage computing !

Offline
Last seen: 3 days 13 hours ago
Joined: Apr 1 2020 - 16:46
Posts: 875
NTSC colors test completed !

Hi fans -

 

thanks to a generous donation by 'softwarejanitor' of two NTSC Apple IIe motherboards, I was enabled to do the final tests of my PLD based IOU/MMU substitute.

 

The original MMU and IOU were pulled from the sockets and my wire wrapped slot card with the substitutes was inserted. For those who did not read the whole thread, the slot card solution was necessary for two reasons:

 

First, I did not want to use two 40 pin flat band cables (too unwieldy, and too expensive) so I only used one 50 wire flat band cable which splits to two 40 pin DIL headers, one with the full 40 pins for the IOU socket, and one with 10 wires for the MMU socket. This means that I have to take the addresses A15-A0 from the slot.

 

But a slot card would have been needed anyways, which is the second reason: the current draw of all these PLDs is too much for a supply pin within an IC socket. That pin would get hot and burn out. The fingers on the slot card edge connector can take just enough current to keep all this in safe limits. Here is how the test setup looks:

 

 

The yellow circle in the above photo is the ESD foam with the original IOU and MMU. The slot card with the PLD based IOU/MMU substitutes is the brown card in the middle of the photo. The Apple IIe on the right hand side is a PAL version which now only acts as the power supply to the NTSC motherboard donated by 'softwarejanitor'. Note missing "6" key cap on the keyboard. It was removde because the key switch does not work anymore. Small complication for my testing work - had to assemble my test program such that no "6" is needed in the command to load it into the machine (from an AIFF file, via the cassette port).

 

 

On this I ran all the tests I wrote for my Apple-1 color graphics card, which can be seen in this link:

 

https://www.applefritter.com/content/glimpse-uncle-bernies-apple-1-color-graphics-card

 

A lot of the PLD design work on the video scanner was already done for this graphics card, so "lifting" it into the IOU substitute was easy.

 

The test program was loaded via the cassette port, and it automatically discerns between the Apple-1 and Apple II (or so I thought, more on this later, in the P.S. at the end of this post). This allows the use of the same AIFF sound file to be loaded into either the Apple-1 with the color graphcis card or the Apple II.

 

There are tests for each graphics mode except DOUBLE HIRES, which is an Apple IIe and IIc thing:

 

Text mode:

 

 

In the above photo, you can see an artifact from the blinking characters in the second character block from the top, character codes $40 to $7F. This is inevitable unless the shutter time on the camera could be set manually and somehow be synchronized to the frame rate of the TV.

 

Mixed screen mode, LORES color graphics and TEXT lines:

 

 

As it turned out, on the Apple IIe, the "Olive" color bar actually is "brown", as advertised in the Apple II User Manual. On my Apple-1 color graphics card, it is darker and looks more like "dark olive". (Note to self: investigate why).

 

HIRES graphics screen mode, with sprites stolen from the game "DROL":

 

 

 

(A much better photo of the same test screen can be found in  the thread on the Apple-1 color graphics card, see link mentioned at begin of this post - this poorer quality photo above is a clipped detail from the photo of the test setup above).

 

I did not write a DOUBLE HIRES test screen yet, so I used the title screen of the game "WINGS OF FURY":

 

 

Within the white circle in the above photo you can see that both LEDs on the IOU/MMU substitute card are lit. This means HIRES and 80COL soft switches are on = DOUBLE HIRES mode.

 

So far everything looks fine. I did not find any discrepancies in the screen layout or the colors. (The colors in the photos are a bit bland, but this is due to the camera, not due to the tested hardware).

 

The only weird thing I found is the following jaggy lines in a HIRES screen filled with rectangles:

 

 

This effect is present in this Apple IIe motherboard both with Apple's original MMU and IOU custom ICs and with my PLD substitutes, and looks the same in both cases. So it's not a flaw / difference of my substitutes to the original custom ICs.

 

I wonder what causes this effect. On the TV it looks worse than in the photo. And it goes away when no horizontal lines are present (the center of the screen is empty / black).

 

Maybe somebody can tell me what is the root cause of this effect ? My own Apple-1 color graphics card does not have it, and my Taiwanese Apple II clone has it neither. But this Apple IIe has it. It is weird, because it looks just as if there is one M14 master clock time delay causing these wiggly / jaggy vertical lines. Same delay as for the PHASE bit (bit #7 in HIRES) which turns magenta / green into orange / blue. But no such color shift is apparent in this test case. Actually, on the real screen, for human eyes, the magenta / green lines have the right color.

 

If somebody has an NTSC Apple IIe and wants to volunteer with making a quick test, please comment, I will then put up the AIFF file with the graphics test program, along with instructions how to load / run it. All you need is a media player able to play AIFF files and  a 3.5mm audio cable with one MONO plug (Apple IIe side) and one STEREO plug (media player side). Anyone who built an Apple-1 with one of my kits (or following my "Tips & Tricks" pdf for Apple-1 builders) should have such a cable.

 

Comments invited !

 

- Uncle Bernie

Offline
Last seen: 1 day 5 hours ago
Joined: Jul 5 2018 - 09:44
Posts: 2587
UncleBernie wrote:Hi fans -
UncleBernie wrote:

Hi fans -

 

thanks to a generous donation by 'softwarejanitor' of two NTSC Apple IIe motherboards, I was enabled to do the final tests of my PLD based IOU/MMU substitute.

 

The original MMU and IOU were pulled from the sockets and my wire wrapped slot card with the substitutes was inserted. For those who did not read the whole thread, the slot card solution

 

 

If I had known you needed a keyswitch I would have sent you one with the motherboards.  I have a couple of donor keyboards so I have quite a few.  If you ever break a keyswitch stem I also have 3D printed key stems and adapters for both of the types of switches used in "SMK" keyboards.  I don't have anything for ALPS style keyboards, but those key switches are far less common on //e keyboards (super common on Macs).

 

If you need a few key switches let me know and I can send you some.

 

Offline
Last seen: 3 days 13 hours ago
Joined: Apr 1 2020 - 16:46
Posts: 875
The jaggy line mystery is solved !

Investigation of the "jaggy vertical line" problem seen i n post #21 turned out it's not a hardware bug in the Applle IIe, but a nasty, new, bad habit which the 1990s era TV I'm using must have developed in the last few months. Once the work with the Apple IIe motherboard was done, I set up my Apple-1 with the color graphics card, ran the same graphics test program, and got the same jaggy lines. I'm sure that when I ran the same test earlier this year, in March or so, the lines were perfect. So it seems something in the TV is slowly dying. Most likely, some issue in the power supply, maybe an electrolytic capacitor which is on its last legs. My hypothesis is that when the electron beam is turned on for the horizontal lines of these rectangles, then the added current draw causes a dip in the supply voltage for the horizontal output stage driving the flyback transformer, and this then leads to slight distortions in the horizontal deflection. Alas, unlike total failure of a component, these failing but still working component issues are time consuming to track down. I think I'd rather bring up one of my professional NTSC monitors frm the basement - I have three which came out of a TV studio. But they also are very old. On the other hand, these are professional monitors which were built with better than consumer grade parts.

 

Yet another casualty of this work is my "Wings of Fury" floppy disk. It does not boot anymore. The only other one I have is of the "moldy" type which I don't want to use. Seems I need to go shopping on Ebay for another "Wings of Fury" floppy disk. This sucks. Seems the copyprotection on these is so good that nobody can make backup copies, so my work eats one original after another. But this is expected - these floppy disks are very, very old and well than 3-4 times past their expected lifetime.

 

Next step is the design and wire wrap of an Apple IIe replica based on my IOU/MMU substitutes. But this will take some time, so don't hold your breath.

 

- Uncle Bernie

 

 

Offline
Last seen: 2 hours 56 min ago
Joined: Sep 9 2021 - 01:43
Posts: 23
Wings of Fury

Wings of Fury has been converted into a woz format.  This can be used with a floppy emulator or written to disk.  It might work copying back to your original disk.

Offline
Last seen: 3 days 13 hours ago
Joined: Apr 1 2020 - 16:46
Posts: 875
(Deleted double post)
Double post deleted. (How can a double post happen ?)
Offline
Last seen: 3 days 13 hours ago
Joined: Apr 1 2020 - 16:46
Posts: 875
Repairing Wings of Fury original disk

In post #24, Modnarmai wrote:

 

"Wings of Fury has been converted into a woz format.

It might work copying back to your original disk."

 

Uncle Bernie responds:

 

Oh, thanks a lot for the hint ! I'd hate to pay the outrageous prices currently being asked on Ebay for "Wings of Fury". Would be the third one.

 

I already have two original disks, one would load, but has the mold (mould) problem and so I had to quarantine it to avoid contamination of my other floppy disks with the mold spores. The other one looks perfectly fine on the surface but now it seems that track 00 (I guess) has been damaged by keeping it in the floppy drive and turning the power on and off. (Never do that). But these things happen in the heat of the battle. If I would follow checklists step by step (as I do when flying small aircraft) these blunders would be avoided ("Open disk lever - remove floppy disk - master power switch off") but it would take so much time I would never get anywhere in any reasonable amout of time. Something similar stupid must have happened to the original MMU IC, it does not work anymore. The Apple IIe almost works with it but never goes into text mode. So now I have the substitutes but no original MMU anymore !

 

Anyways, if you could enlighten me where to find the .WOZ file and which hardware/software necessary to put the WOZ image back on the original floppy disk, please tell me. You might use a personal message. The irony is that I built such a piece of hardware myself, long ago, I call it the "Ratweasel", a pun against the more elegant and much more expensive "Catweasel", but my software development for it never got to the point where it could build WOZ image files (or put them back on disk). IIRC, I got somehow stuck at the point trying to produce double wide tracks, which seemed to be trivial, but isn't.

 

(I have more unfinished projects than you would guess - the problem is that when others beat me to the finish line, then I lose the motivation to continue with my work. There is one super cheap USB based thingy out there which can read and write floppy disks at the flux level, but last time I looked, they still did not support the Apple II world, and now I seem to have forgotten its name, which also was a funny one.)

 

ADDENDUM: I remembered the name just after shooting off the post. It's the "Greaseweazle". Just looked up their github site --- it still seems they don't fully support Apple II images, at least I did not find any reference to the WOZ format (correct me if I'm wrong). They seem to support Mac better than Apple II.

 

- Uncle Bernie

MacFly's picture
Offline
Last seen: 1 day 22 hours ago
Joined: Nov 7 2019 - 13:49
Posts: 447
Fritters with Applesauce

The applesauce project has software & hardware to read & write woz images - which usually preserves copy protection. However, if you just needed it for a single disk, might still be cheaper and easier to buy the disk from ebay...

A Wings of Fury woz image can be found here. It's also playable in emulators which support woz images.

However, if you do have an original disk and only track 0 got damaged, have you tried repairing it? I would make a dump of the entire disk, then compare the data with another image (e.g. from above). You would need to find an image of the disk which matches your version (except for your damaged track 0). Then you might get away with just rewriting the defect sectors on track 0. Rewriting existing sectors would not change the timing of these sectors between the tracks. Overwritten data is written at the same position / with the same timing as the former sector. Just formatting the disk would change the sector timing between the tracks.

If track 0 was completely damaged, and you cannot even write to it any longer, than you might still get lucky by formatting track 0 only - but keeping the rest of the disk untouched. Chances are, the copy protection isn't checking the timing of track 0, but just comparing the timing of some other tracks. In that case even formatting and completely rewriting track 0 wouldn't compromise the disk's copy protection. Of course, we don't know which tracks the software is checking for the protection. But in the worst case, if the reformatted track 0 was checked, then the disk would still be broken - just as it is now. So you have little to lose...

Log in or register to post comments