First of all, a 3D app where you can do animation. There's several.
Blender is probably the best bet if you're looking for a FREE app.
Other than that, the prices vary. 3Ds max seems to be popular among some crowds. Maya among others. Myself I've been using LightWave 3D for about 15 years, and recently switched to Modo. Both LightWave and Modo cost about $1000-1700 Max and Maya, a bit more. But don't be fooled. The price tag doesn't equal 'better'. Maya is terrific for character animation in general, but is also hands down the most awesome tool I've tried for rigging.
Modo and LightWave is superb for modeling and rendering out of the box.
Ok, so that's the 3D app settled.
Next, you need an app to track the camera moves. You can do this directly in many compositors, such as After Effects, Nuke or Fusion, but there's also dedicated tracking apps. My favorite is SynthEyes. A low cost (compared to the competition) software which can do wonders for the most challenging shots.
Regardless, in the end, after you've modeled, textured, animated and rendered your animations, you need to composite the renders into the video footage. You could use any compositor. Even Photoshop (but boy would it be time consuming..)
I prefer Nuke. Nodal based. Awesome stuff. After Effects is also a good alternative. A cheaper alternative. Fusion is also a great alternative, and comes in a free version.
Nuke also have a free, "personal learning" edition which means: you cannot make money on what you make with it, and a few other restrictions, but is definitely a great way to learn Nuke and doodle with your own projects, such as the two above.
The pipeline would be
- Shoot material
- Track
- Animate the objects
- Lighting and rendering to match the original footage
- Compositing
Have fun!