home / TVC

Final Design Goals

PCI express form factor
Embeddable synthesize-able core (deep embedded ok)
Possible external USB/Thunderbolt graphics unit
The capability of driving multiple high resolution displays
Perfect display quality
A well defined and simple hardware interface
Extravagant supply and flexible use of frame buffer
Use of EFI/UGA for bios/boot display (instead of implementing a VGA mode)
A personal requirement is support for high resolution quad buffered stereo display.

Drawing Opps Required

Bit-blit
Single pixel line
Block fill
OpenGL flat/smooth/textured rendering
Alpha blending units to simplify desktop composition
?Additional units/functions?

My initial approach was to implement all of the above in specialized hardware units. However, due to the simplicity of writing shader programs, it is likely that most of the above will be handled in firmware. I am not sure if there will be an advantage in size or power to keep specialized functional units in the TVC.

Anti-aliasing will be done at the whole display or rendering window level, not at the entity level.

Performance Increases

Some back of the envelope calculations with a dull crayon follow. These numbers are based on TVC release #5's pure firmware/software implementation. I tried to lean toward conservative estimates. These numbers assume moving to an artix7 XC7A200T, and again no specialized rendering units.

Speedup	How
20x	More Cores
4x	Hardware FP Support
1.5x	Vector Opps
3x	Clock Speed
1.5x	Compiler Optimizations
540x	Total

Design Abstractions

TVC development has proceeded based on the abstraction that a video card is composed of four components. These components are: #1 a large block of memory used for the frame buffer, depth buffer, textures, commands, etc., #2 a display module that pushes pixel values to an external display, #3 a communications link to a host processor and #4, a block of drawing functional units driven by the host that perform drawing operations by reading and writing to the large block of memory.

Most of the complexity specific to a video card is in the block of functional units. The functional units may be developed largely independently of each other, provided all components are designed around common internal interconnects.

TODO

Move to a different development kit. The new kit should have at least a wide DDR SDRAM interface, a higher bit display (24 or 30bpp preferred), and high host command bandwidth. A PCI-e interface would be good too.
Add multiple shader processors. Properly synchronize / share access of framebuffer between active shader processors.
Fix bugs in the compiler and add some optimizations.
Get X running on the TVC with an emacs window editing TVC VHDL code (self hosting).
Mode setting. Probably 640x480, 800x600, and 1024x768.
Add performance counters to Engine. I need to know just how many pixels/sec and polygons/sec the device is pushing. A standard benchmark needs to exist and be run on each release.
Implement a rectangular block fill. This is different from the block_set MU in that the block_set MU only sets consecutive frame buffer addressees to a particular value (so it can clear the entire frame buffer or depth buffer), whereas a rectangular block fill will allow setting a rectangle of pixels within the frame buffer to a particular value. This may be accomplished by extending the block_set MU so that its current behavior is just a special case.
Implement drawing single pixel lines in a hardware unit.
Implement a hardware cursor.
Implement double buffering.
Add a command sequencer that will allow the execution of commands from the frame buffer.
Make a first cut at integrating into Mesa. Mesa can be compiled as a stand alone library with a mgl (instead of gl) prefix for the API as not to conflict with an existing openGL implementation.

Done

Test pattern only display
Frame buffer drawn on VGA display with drawing pixels over parallel port
Add block set functional unit to accelerate clearing display and buffers
Depth buffered scan line implementation for flat shading
Implement simple texture mapping i.e. nearest pixel selected, no mip-map
Remove all dependence of the high level TVC design on a parallel port. This involved implementing the control bus.
Implement a simulation framework; both software and RTL
Implement a programmable shader processor

Not Going To Do

Fix the bugs in the mcu-hl.

These bugs are likely the cause of the missing/incorrect pixels in the teapot test.

Reason: After playing with the design of a sdram controller, I realized that the current design of TVC is quite tied to the current memory controller, and the addition of the DQM (data mask lines) in sdram makes a tremendous amount of complexity in the MCU-HL obsolete and irrelevant. The entire memory controller needs to be ripped out and replaced (which will mean I need a different development kit). This change will ripple changes into the MU bus design.