home / TVC
Final Design Goals
- PCI express form factor
- Embeddable synthesize-able core (deep embedded ok)
- Possible external USB/Thunderbolt graphics unit
- The capability of driving multiple high resolution displays
- Perfect display quality
- A well defined and simple hardware interface
- Extravagant supply and flexible use of frame buffer
- Use of EFI/UGA for bios/boot display (instead of implementing
a VGA mode)
- A personal requirement is support for high resolution quad
buffered stereo display.
Drawing Opps Required
- Bit-blit
- Single pixel line
- Block fill
- OpenGL flat/smooth/textured rendering
- Alpha blending units to simplify desktop composition
- ?Additional units/functions?
My initial approach was to implement all of the above in
specialized hardware units. However, due to the simplicity
of writing shader programs, it is likely that most of the above
will be handled in firmware. I am not sure if there will be
an advantage in size or power to keep specialized functional units
in the TVC.
Anti-aliasing will be done at the whole display or rendering
window level, not at the entity level.
Performance Increases
Some back of the envelope calculations with a dull crayon follow.
These numbers are based on TVC release #5's pure firmware/software
implementation. I tried to lean toward conservative estimates.
These numbers assume moving to an artix7 XC7A200T, and again no
specialized rendering units.
Speedup
|
How
|
20x |
More Cores |
4x |
Hardware FP Support |
1.5x |
Vector Opps |
3x |
Clock Speed |
1.5x |
Compiler Optimizations |
540x
|
Total
|
Design Abstractions
TVC development has proceeded based on the abstraction that a
video card is composed of four components. These components
are: #1 a large block of memory used for the frame buffer, depth
buffer, textures, commands, etc., #2 a display module that
pushes pixel values to an external display, #3 a communications
link to a host processor and #4, a block of drawing functional
units driven by the host that perform drawing operations by
reading and writing to the large block of memory.
Most of the complexity specific to a video card is in the block
of functional units. The functional units may be developed
largely independently of each other, provided all components are
designed around common internal interconnects.
TODO
- Move to a different development kit. The new kit should have
at least a wide DDR SDRAM interface, a higher bit display (24 or
30bpp preferred), and high host command bandwidth. A PCI-e
interface would be good too.
- Add multiple shader processors. Properly synchronize / share
access of framebuffer between active shader processors.
- Fix bugs in the compiler and add some optimizations.
- Get X running on the TVC with an emacs window editing TVC VHDL
code (self hosting).
- Mode setting. Probably 640x480, 800x600, and 1024x768.
- Add performance counters to Engine. I need to know just
how many pixels/sec and polygons/sec the device is
pushing. A standard benchmark needs to exist and be run on
each release.
- Implement a rectangular block fill. This is different
from the block_set MU in that the block_set MU only sets
consecutive frame buffer addressees to a particular value (so it
can clear the entire frame buffer or depth buffer), whereas a
rectangular block fill will allow setting a rectangle of pixels
within the frame buffer to a particular value. This may be
accomplished by extending the block_set MU so that its
current behavior is just a special case.
- Implement drawing single pixel lines in a hardware unit.
- Implement a hardware cursor.
- Implement double buffering.
- Add a command sequencer that will allow the execution of
commands from the frame buffer.
- Make a first cut at integrating into Mesa. Mesa can be
compiled as a stand alone library with a mgl (instead of gl)
prefix for the API as not to conflict with an existing openGL
implementation.
Done
- Test pattern only display
- Frame buffer drawn on VGA display with drawing pixels over
parallel port
- Add block set functional unit to accelerate clearing display
and buffers
- Depth buffered scan line implementation for flat shading
- Implement simple texture mapping i.e. nearest pixel selected,
no mip-map
- Remove all dependence of the high level TVC design on a
parallel port. This involved implementing the control bus.
- Implement a simulation framework; both software and RTL
- Implement a programmable shader processor
Not Going To Do
Fix the bugs in the mcu-hl.
These bugs are likely the cause of
the missing/incorrect pixels in the teapot test.
Reason: After playing with the design of a sdram controller, I
realized that the current design of TVC is quite tied to the current
memory controller, and the addition of the DQM (data mask lines) in
sdram makes a tremendous amount of complexity in the MCU-HL obsolete
and irrelevant. The entire memory controller needs to be
ripped out and replaced (which will mean I need a different
development kit). This change will ripple changes into the MU
bus design.