Vision-Guided Pick-and-Place with a 7DoF Robotic Arm
Together with Solomon Gonzalez, Mateusz Jaszczuk, and Andrik Puentes, we developed a robotic system that could autonomously detect, grasp, and stack blocks in both static and dynamic environments. The platform was built on the Franka Emika Panda, a seven degree of freedom arm, and combined vision-based perception, kinematic planning, and grasp execution in a unified control pipeline. The final evaluation required robots to build tall, stable towers under time constraints, with performance measured on speed, reliability, and the ability to adapt to both stationary and moving objects.
System Architecture
The control framework was organized around two primary modules. The ArmController
mapped task-level objectives into joint trajectories using inverse kinematics with null-space projection, ensuring that the arm reached targets while maintaining stable joint configurations. The ObjectDetector
processed AprilTag detections from a wrist-mounted camera, transformed block poses into the world frame, and passed them to the controller for grasp execution. Together, these modules enabled continuous block localization and stable motion planning.
To reduce planning complexity, we relied on a small set of predefined task configurations: an initialization pose, scanning poses for both static and dynamic blocks, and a placement pose over the stacking platform. These anchor points improved predictability and reduced computational load during repeated tasks.
Static Block Manipulation
The static block pipeline was highly reliable. Once an AprilTag was detected, the block’s pose was computed in the world frame, and the ArmController
generated an alignment trajectory. The gripper descended, closed, and lifted the block, then placed it onto the scoring platform with a z-offset proportional to the current tower height. Simulation results showed perfect success rates, while hardware introduced some errors due to calibration offsets, block irregularities, and inconsistent gripper force. Overall, the static task provided consistent points during competition.
Dynamic Block Manipulation
The dynamic task introduced greater complexity. Blocks moved along a rotating turntable, requiring prediction of future positions. By estimating the angular velocity of the platform, the system extrapolated the expected angular displacement at grasp time, then computed the target pose in the world frame. The ArmController
generated interception trajectories aligned to these predictions.
Simulation success exceeded seventy percent, but hardware performance was closer to fifty-seven percent, affected by camera noise, delays in pose estimation, and calibration drift. Mitigations included rejecting detections near the gripper and elevating approach trajectories to avoid collisions, which improved consistency but underscored the sensitivity of dynamic grasping to timing and noise.
Block Stacking
Both static and dynamic blocks were stacked using the same placement routine. After each successful grasp, the system updated the tower height estimate and applied incremental offsets for precise placement. The inverse kinematics solver maintained alignment across multiple placements, producing stable towers under repeated trials. Placement accuracy was especially important as the scoring platform offered only a narrow surface, and any lateral drift accumulated over successive layers.
Evaluation and Strategy
Our competitive strategy prioritized reliability by targeting static blocks first, then attempting dynamic blocks with remaining time. This ensured steady point accumulation while still showcasing the dynamic tracking pipeline. The system advanced to the quarterfinals and demonstrated both robust static stacking and functional, if less consistent, dynamic interception.
Challenges and Lessons Learned
Hardware testing revealed discrepancies with simulation. Narrower camera fields of view and calibration offsets caused systematic errors, while variations in gripper force reduced grasp reliability. Real-time delays in perception and trajectory execution limited dynamic interception success, and frequent inverse kinematics calls imposed computational overhead. We addressed these issues with repeated calibration, offset tuning, optimized code paths, and safety thresholds in detection. The project emphasized the importance of robust perception and hardware-aware planning when transferring algorithms from simulation to real systems.