The issues with SLI in VR have to do with keeping the single image produced by the two cards cohesive enough that it doesn't ever lag or fragment across the hmd's two displays. It's a difficult problem to completely solve.
Here's the wiki on SLI, which lays a good groundwork for the most common ways of splitting a rendering workload across two cards. None of those workloads scale well to VR, primarily because your visual and vestibular system is
extremely sensitive to latency or out-of-sync-ness between what each of your eyes see. Even "card A does left eye, card B does right eye" doesn't really work because the output of the two isn't always latency-free enough to support a continuous 90hz image.
The workarounds for SLI in VR will almost certainly need to be done at the API and driver level, since for most developers the video subsystem is essentially a black box that they interact with via DirectX or OpenGL. Nvidia and AMD are
working on it, but it's unclear at this point if the eventual fix to make multi-gpu rendering work well in VR will be a drop-in fix (like, install these new VR drivers and boom, SLI/crossfire now works in VR!) or will require developers to rewrite their applications to take advantage of new APIs.
Bottom line: we're in the early-adopter phase and some things just aren't going to work because problems are still being solved. If you're buying into VR right now, you are a guinea pig, just like any early adopter. If you want flawless SLI support, hop in your time machine and head to a year or so from now and see how things have turned out
edit - more thoughts -
It's also important to understand that when you add a second graphics card, it doesn't just magically weld to the first one and become a single more powerful video card (even though a LOT of brilliant coding has gone into video card driver design to make it look and act that way so you can game on and not have to care about what's happening under the hood). That second video card is just that—a second video card, and it has to work with the first video card and figure out a way of dividing the work of rendering frames between the two of them.
The methods the cards use are in that wiki article up there, but
parallelization like this is difficult, and not every kind of computing problem is easily parallelized. It seems on the surface that just having one video card render even frames and the other render odd is the best way to do it, but often it just doesn't work that way because the frames might contain different levels of complexity and the cards might finish with their rendering at different times, for example.
Some rendering jobs really are easier to do with a single video card, just like how some kinds of work don't benefit from added hands. For an example, look at the task of, say, making hamburgers. You put one person on the hamburger line and say, "Make me 10 hamburgers," and that person first grabs the buns, grills the meat, puts on the lettuce and the onions and whatever, and eventually makes the first burger. Then he goes back and grabs the buns, grills the meat, and so on, and makes the second burger.
Add a second person, though, and the output can immediately be increased. The second person can do the condiments on a burger while the first guy grills, or he can watch the grill while the second guy cuts tomatoes, or whatever. Making hamburgers is a process that benefits from parallelization.
Compare that, though, to creating a painting. Sit an artist down with some oils and tell him to make a cool Bob Ross landscape or whatever, and that person will in some period of time draw a landscape out of his brush (we're going to assume our person can in fact paint). But adding a second artist doesn't immediately mean the painting will go twice as fast—how does the second person know what the first person is thinking? Does the second person take half the paint palette and use browns adn blues, while the first person uses reds and white, or whatever? How do they coordinate where they're painting? What if they both need to paint in the same area, or mix brown with red?
Painting a picture is a process that does
not benefit from parallelization—at least, not without a LOT of thought and careful optimization and planning.
Dual-eye rendering in VR is also a process that's difficult to apply parallelization to—the splitting of the work has to be done very carefully or the whole thing just falls apart and you puke because your eyes are seeing different things. And
that's why it's taking so long for the OEMs to get right.