The Future of XR is Hybrid with Controllers AND Hands

Chris Burgess
Mar 2
6 min read

Updated: Mar 14

VR hand and controller interact with red and blue cubes in a minimalistic virtual space. A loading bar is visible, adding a tech vibe. — Why not allow hand tracking for one hand, and a controller in the other? This was footage of Multimodal captured by 'Valem Tutorials'

XR—Extended Reality—encompasses the spectrum of immersive technologies: Augmented Reality (AR), Virtual Reality (VR), and Mixed Reality (MR). MR, in particular, is redefining immersion. Unlike traditional VR, which isolates users in purely digital worlds, MR headsets use video passthrough cameras to blend real and virtual environments. This fusion allows users to see their actual hands in the virtual space, creating a seamless bridge between physical and digital interaction.

For years, controllers dominated VR input. Their precision and tactile feedback make them indispensable—but also when an experience is so immersive, users can forgot they are holding them. But in MR, where your real hands are visible, hand tracking becomes more important. This shift is driven by the instinctive feel of reaching out to grab a virtual object with your bare hand, a gesture that bridges the gap between physical intuition and digital interaction.

But here’s the problem: the industry is framing this as a binary choice. Controllers or hand tracking? Why not both? By merging the precision of controllers with the intuitive freedom of hand tracking, we can unlock experiences neither input method could achieve alone. Imagine a surgeon adjusting a robotic tool with a controller while gesturing to navigate patient data with their free hand. Or a gamer parrying attacks with a controller-held sword while casting spells through hand motions.

This hybrid approach isn’t just about flexibility—it’s about mirroring real-world behaviour. In the physical world, we rarely use one hand for complex tasks. We pair tools (a chef’s knife and tongs) with natural gestures (adjusting ingredients). Bringing this duality into XR enhances immersion, usability, and accessibility. It also eliminates handedness bias, empowering users to choose which hand wields the controller and which interacts freely.

Why Combine Them?

Think about it: in the real world, we rarely use just one hand for complex tasks. Whether it’s manipulating objects, using tools, or performing actions, our brains naturally rely on both hands in tandem. Translating this into XR brings a level of presence and agency that feels more connected and realistic. By combining controllers and hand tracking, we can mirror these real-world behaviours to create richer, more engaging experiences.

Key Advantages:

Precision and Control: Controllers excel at fine motor control and haptic feedback. They’re ideal for tasks that require tactile precision, such as manipulating small objects or navigating complex menus.
Intuitive Interaction: Hand tracking shines in more natural, fluid movements. Grabbing, pointing, or making expressive gestures in XR feels incredibly immersive and intuitive.
Task Specialisation: With hybrid input, users can specialise tasks between the two methods. For instance, you could use a controller for precise actions (like adjusting settings) while using hand tracking for broader, gestural interactions (like rotating a 3D model or navigating menus).
Enhanced Immersion: The combined input methods could elevate the sense of presence and control, making the XR experience feel more seamless and connected.
Accessibility and Inclusivity: Offering both input options accommodates different user preferences—some may prefer the tactile feedback of a controller, while others may find hand tracking more intuitive. Plus, the hybrid approach eliminates handedness bias, allowing users to choose which hand holds the controller and which hand uses hand tracking.

Challenges to Overcome

While the hybrid input method holds great potential, it’s not without its challenges:

UI/UX Design: Integrating both inputs into a single interface requires careful design to avoid conflicts and provide clear visual cues.
Learning Curve: The combined input system may take some time for users to adjust to. Using affordances (i.e. visual or functional cues that indicate how an object or interface element should be used) to make it as clear as possible whether something can be manipulated with a controller or your hand is important, but giving the user the freedom to use either would reduce the learning curve.
Performance Optimisation: Processing both controller and hand tracking data simultaneously can be computationally intensive so it will need thorough testing.

Transforming Gaming: A Game-Changer

The gaming industry is ripe for the hybrid revolution. Imagine:

Dynamic Combat: Block attacks with a controller while casting spells through hand gestures.
Puzzle Solving: Rotate puzzle pieces with hand gestures while using a controller to lock solutions.
Immersive Environments: Use a controller for precise vehicle control while using hand tracking to interact with instruments.

Real-World Applications: Beyond Gaming

The hybrid approach is more than just a gaming innovation; it has practical applications across various industries:

Medical Training & Surgery: Surgeons could use controllers for precise adjustments while leveraging hand tracking for quick menu navigation.
Architecture & Design: Users can manipulate a 3D model with a controller while using hand gestures to explore materials or make real-time adjustments.
Education & Training: VR classrooms could allow students to hold a virtual pointer with one hand while interacting naturally with learning materials using the other.

Building the Hybrid Future: Tools and Resources

As we've discussed, the potential of hybrid input is clear, but its successful implementation hinges on overcoming key challenges. Thankfully, the development landscape is evolving to support this vision. Key platforms and frameworks are actively embracing and enabling hybrid input approaches.

Meta's Quest Platform leads the way with explicit support. Their developer documentation confirms that apps can enable both hand tracking and controllers simultaneously, stating, "You can enable both hand tracking and controller input in your app. For example, users can pick up and put down controllers while using hand tracking."

Virtual interface demo shows "Multimodal Input" with VR controllers and "Microgestures" in a digital room. Text: New ways to interact.

They call this 'multi-modal support'. This direct support is further bolstered by the Meta Interaction SDK, which provides features for mixed input, and the invaluable First Hand demo on GitHub, offering a practical working example. However, whilst detailed guidelines and best practices for optimal hybrid input design from Meta is limited, there are developers out there producing content: Valem Tutorials has produced a tutorial show you how to do this which is quite comprehensive.

OpenXR, the foundation for building XR experiences that work across devices, also supports hybrid input. While the documentation doesn't specifically say 'hybrid input', developers can create apps where you can seamlessly switch between using your hands and controllers, or use them together. However, as with Meta's implementation, developers need to carefully consider how to best integrate these input methods within their applications.

Unity's XR Interaction Toolkit provides another powerful toolset for building hybrid input experiences - and they provide a tool for exactly that called 'XR Input Modality Manager'. The toolkit offers built-in support for these setups, allowing developers to configure an Input Action Manager to handle both controller and hand tracking inputs. Unity's comprehensive documentation and sample projects provide practical implementation guides, making it easier for developers to get started. However, as with other platforms, clear and comprehensive guidelines for hybrid input design are still under development.

In essence, we see:

Explicit Support: Meta's Quest documentation directly confirms that controllers and hand tracking can be used simultaneously, simplifying the development process.
Implicit Support: OpenXR and Unity provide robust frameworks to mix input types, empowering developers to implement the necessary logic, such as binding actions to hands and controllers.
Practical Examples: Meta's First Hand demo and Unity's XR Interaction Toolkit samples serve as invaluable starting points for developers looking to explore hybrid input.
Developer Responsibility: While tools and resources are available, developers bear the responsibility of creating intuitive and effective hybrid input interactions, as specific guidelines are still evolving.
Lack of Design Guidelines: While design standards for hybrid input are still emerging, this gap invites creativity—a chance for developers to pioneer intuitive interactions.

By leveraging these tools and resources, developers can effectively address the challenges we discussed earlier and unlock the full potential of hybrid input, creating more intuitive, immersive, and user-friendly XR experiences.

The Hybrid Future of XR: Controllers + Hands

The XR industry is currently debating the merits of hand tracking versus controllers, but the real question isn't 'either/or' – it's 'and.' The hybrid future of XR is controller and hands, combining the precision of controllers with the intuitive freedom of hand tracking, offers the best of both worlds. This approach unlocks new levels of immersion, allowing users to seamlessly interact with virtual environments in ways that feel both natural and powerful. As developers and designers continue to explore and embrace this hybrid model, new experiences will emerge.