Rendering faces together as one object will not work if you do any kind of per-vertex shading, so you should probably avoid this technique unless you really know what you're doing.
Your world is probably divided into rectangular blocks of fixed size; let's call them chunks. You can avoid rendering any chunks that lie outside of the view frustum. This is easy to implement on top of other optimizations, so you might save it for last.
More importantly, you should make sure to only render exterior faces. If you have a large piece of rock, rendering faces inside the rock will drastically hurt performance. You can implement this by traversing every voxel in a chunk and checking whether its neighboring voxels are "solid". If a neighbor is not solid, add its vertices to a vertex buffer. By the end, the vertex buffer should only contain "exterior" vertices, although it will not ignore "holes" inside of an object.
If your voxels can change during gameplay, you have to rerun the above algorithm every time the chunk (or a neighboring voxel) changes. Depending on your game, it may run fast enough without modification or you may need to optimize it further.
Lastly, don't underestimate the importance of sorting your chunks front to back. If part of a far-away chunk fails the depth test, it skips some of the steps in the rendering pipeline such as texturing. Before rendering, sort your chunks based on distance from the player. While not perfectly back-to-front, this is easy to implement and runs quickly.