We present RetroDepth, a new vision-based system for accurately sensing the 3D silhouettes of hands, styluses, and other objects, as they interact on and above physical surfaces. Our setup is simple, cheap, and easily reproducible, comprising of two infrared cameras, diffuse infrared LEDs, and any off-the-shelf retro-reflective material. The retro-reflector aids image segmentation, creating a strong contrast between the surface and any object in proximity. A new highly efficient stereo matching algorithm precisely estimates the 3D contours of interacting objects and the retro-reflective surfaces. A novel pipeline enables 3D finger, hand and object tracking, as well as gesture recognition, purely using these 3D contours. We demonstrate high-precision sensing, allowing robust disambiguation between a finger or stylus touching, pressing or interacting above the surface. This allows many interactive scenarios that seamlessly mix together freehand 3D interactions with touch, pressure and stylus input. As shown, these rich modalities of input are enabled on and above any retro-reflective surface, including custom "physical widgets" fabricated by users. We compare our system with Kinect and Leap Motion, and conclude with limitations and future work.