The Journey
Chapter 1: It Started with a Strange Question
Most engineers are comfortable with first-order and second-order derivatives. I started my research career at National University of Sciences and Technology (NUST), Pakistan, by asking a question that makes most people pause: what if you took the 1.3th-order derivative of a function?
That's not a typo. Fractional calculus, the generalization of differentiation and integration to arbitrary (real or even complex) orders, is a real branch of mathematics, and it turns out to be remarkably useful for modeling physical systems that classical calculus struggles with: viscoelastic materials, anomalous diffusion, signal processing. For my Master's thesis at NUST, I formalized fractional calculus in higher-order logic using the HOL theorem prover, building the mathematical infrastructure to reason about fractional-order systems with machine-checked precision. The work was published at FMCAD 2011 and included a formalization of the Gamma function that appeared in the Journal of Automated Reasoning.
The idea was simple, even if the math wasn't: if you're going to trust a physical system with human lives, maybe a computer should check the math, not just run simulations.
Chapter 2: Formalizing Physics Before It Was Cool
That question led me to the Hardware Verification Group (HVG) at Concordia University in Montreal, where I pursued my Ph.D. under Sofiène Tahar. The group had a bold ambition: formally verify properties of real physical systems, not just software or hardware, but the physics itself.
This was the early 2010s, years before projects like Physlib (formerly PhysLean) or Lean4Physics brought the idea of “digitalized physics” into the spotlight with Lean 4. We were doing this work in HOL Light, building machine-checked formalizations of geometrical optics, light rays, beams, lenses, mirrors, optical cavities, from the ground up.
My Ph.D. thesis, Formal Analysis of Geometrical Optics using Theorem Proving (2015), wasn't an abstract exercise. I used the framework to formally verify the stability of optical resonators, the kind of components found in laser surgery equipment, aerospace instruments, and the receiver module of the APEX telescope. That resonator stability work was published at the NASA Formal Methods Symposium (NFM 2013), putting formal verification of physics squarely in front of the aerospace and safety community.
I also collaborated with Cezary Kaliszyk (University of Innsbruck) and Josef Urban (Radboud University) on a paper exploring the use of learning-assisted automated reasoning for physics, Formalizing Physics: Automation, Presentation and Foundation Issues (CICM 2015). The paper investigated whether AI-driven proof automation could match interactive proofs for the verification of optical systems. In hindsight, this was an early glimpse of the question that would define my later career: what happens when AI meets safety-critical reasoning?
At the time, this work was met with skepticism. Formalizing real-world physics in a theorem prover felt like an academic curiosity to most people. The tooling was clunky, the learning curve was steep, and the practical payoff seemed distant. But the world caught up. The rise of Lean 4 as a mature proof assistant, combined with AI-driven proof automation, has turned formal verification into one of the most exciting frontiers in mathematics and computer science. Fields Medalist Terence Tao is now actively formalizing his own textbooks in Lean, found a bug in one of his published papers through the process, and has spoken publicly about formal verification enabling human-AI collaboration in mathematics at scale. Google DeepMind's AlphaProof used Lean to solve International Mathematical Olympiad problems through reinforcement learning. By 2025, multiple independent AI systems were achieving gold-medal performance on the IMO with fully verified Lean proofs, and Lean received the ACM SIGPLAN Award with the citation that it “has become the de facto choice for AI-based systems of mathematical reasoning.” The ideas that our group at HVG was exploring in HOL Light a decade ago, machine-checked proofs of real mathematical and physical properties, guided by AI, are now at the center of a global research movement. The difference is that we were applying them to safety-critical physical systems from the start.
Chapter 3: When One Execution Trace Isn't Enough
After my Ph.D., I took a detour that would quietly shape everything that followed. As an NSERC Fellow, I worked with Borzoo Bonakdarpour (now at Michigan State University) on a deceptively simple-sounding problem: what happens when you can't verify a system's security by looking at just one execution?
Most verification techniques watch a single run of a system and check whether it satisfies a property. But some of the most important security and privacy guarantees, like non-interference, are fundamentally about the relationship between multiple executions. For example: if a user changes their secret password and re-runs the same program, does the publicly visible output stay the same? If it doesn't, an attacker who observes two runs can deduce something about the secret. You can't catch that by monitoring one trace. You need to compare traces, and that's a hyperproperty.
Think of it this way: imagine you're testing a voting system. You want to guarantee that swapping two voters' ballots doesn't change the election outcome (ballot secrecy). That's not a property of any single election, it's a property about pairs of elections. Classical verification tools can't express this. HyperLTL can.
The challenge was that monitoring hyperproperties at runtime was thought to be prohibitively expensive: in the worst case you'd need to store and compare an exponential number of trace combinations. Together with Noel Brett and Bonakdarpour, I developed a rewriting-based runtime verification technique for alternation-free HyperLTL that sidestepped this explosion. Instead of enumerating all trace combinations, the technique progressively rewrites the formula as new observations arrive, achieving space complexity independent of the number of trace quantifiers. The work was published at TACAS 2017, one of the premier venues in formal methods, and has become a foundational reference in the hyperproperties monitoring literature.
In retrospect, this research fellowship bridged two halves of my career. The theorem proving work had been about verifying physics (optics, resonators). The hyperproperties work was about verifying system behaviors (security, privacy, information flow). Industry would require both: systems that are physically safe and behaviorally secure.
Chapter 4: From Proofs to Products
Looking back, there's a thread running through everything so far: mathematics (fractional calculus, gamma functions), then physics (optics, resonators, electromagnetic theory), then cyber-physical systems (hyperproperties, runtime verification of software interacting with the real world). Each step moved closer to the systems that actually touch people's lives. Industry was the natural next step, not away from safety, but all the way into it.
At BlackBerry QNX in Ottawa, I worked on certifying the QNX real-time operating system for safety-critical applications under ISO 26262. QNX runs in over 300 million vehicles worldwide. Certifying it means every line of reasoning has to hold up under the scrutiny of functional safety auditors. This is where I learned that formal rigor without practical process discipline is just an academic exercise. The real challenge isn't proving a theorem. It's building a safety case that an assessor, a regulator, and an engineer can all understand and trust.
At Magna International, I worked on ADAS (Advanced Driver-Assistance Systems): the safety systems that sit between a human driver and the road, making split-second decisions about braking, steering, and collision avoidance. This was my first direct exposure to AI-driven perception in safety-critical loops, where a neural network's confidence score can be the difference between a car stopping and a car not stopping.
At the Bosch Research and Technology Center in Sunnyvale, California, I dove into autonomous driving. Bosch's research program included joint work with Mercedes-Benz on autonomous vehicle development, and I was at the intersection of safety engineering and the AI-driven systems that were rapidly transforming the automotive industry. The scale was different here: not a single ADAS feature, but the entire driving task, handed over to software.
Then came Apple's Special Projects Group (SPG), Apple's autonomous systems division. I specialized in architecting fault-tolerant systems for both autonomous vehicles and mobile robotics, working across the full stack from high-level conceptual design to detailed software and hardware engineering. That meant fail-safe, fail-degraded, and fail-operational system architectures; AI/ML-driven perception and planning algorithms; and sensor technologies including camera, LiDAR, RADAR, and IMU. At Apple, “power down” is not a safe state, so every design decision has to account for what happens when things go wrong and the system has to keep functioning safely. Working at that level taught me something that formal methods alone never could: safety for these systems is fundamentally an architecture problem, not just a verification problem. You can't bolt it on afterward. It has to be designed in from the first line of the requirements.
Across all of these, QNX, Magna, Bosch, Apple, the pattern was the same: increasingly complex autonomous systems, increasingly AI-driven, and a safety engineering discipline that was struggling to keep up.
Chapter 5: The Gap (and a Name for It)
By 2020, I had seen the same problem from every angle: safety engineering practices were built for a world of deterministic software and well-understood failure modes. But autonomous systems, with their machine learning components, massive data pipelines, continuous updates, and complex physical-world interactions, didn't fit that mold.
Safety teams were often disconnected from development teams. Safety analysis was done retrospectively, not continuously. Tools were fragmented. Feedback loops were slow or nonexistent.
So I wrote a position paper and gave the potential solution a name: SafetyOps.
SafetyOps is a framework that combines DevOps, TestOps, DataOps, and MLOps with safety engineering, creating a continuous, traceable, and efficient safety lifecycle for autonomous systems. The idea: safety isn't a gate you pass at the end; it's a discipline woven into every iteration, every data pipeline, every deployment. Published in 2020 while at Bosch, SafetyOps has since influenced thinking around continuous safety assurance for AI-driven systems.
Chapter 6: Shaping the Standards
Alongside the industry work, I've been contributing to the standards that define what “safe” actually means for these systems.
Through the Standards Council of Canada, I participate in ISO/TC 22/SC 32, which is responsible for ISO 26262, the foundational standard for automotive functional safety. As autonomous systems expanded beyond cars into robots and humanoids, I began contributing to ISO/TC 299 (Robotics), which develops safety requirements for industrial robots, service robots, and collaborative robots. Its Working Group 12 is now tackling one of the newest and most consequential categories: the safety of dynamically stable mobile robots, including legged humanoids entering factories and warehouses.
I also work with standards that address AI safety directly. ISO/PAS 8800 (published December 2024) extends the automotive safety framework to cover AI-specific risks: what happens when a neural network inside a vehicle produces an insufficient output, exhibits a systematic error, or fails due to random hardware faults. ISO/IEC TR 5469 (published January 2024) takes a broader view, providing a cross-domain framework for using AI inside any safety-related function, whether that's a car, a medical device, or an industrial robot.
These aren't just documents on a shelf. They are the rules that determine whether an autonomous car, a warehouse robot, or a humanoid can be legally deployed. Being part of shaping them means the lessons I've learned in practice feed back into the frameworks that govern the entire industry.
Chapter 7: The Competence Shadow
In 2026, I published a second foundational concept: The Competence Shadow.
As AI assistants (including LLMs) started being used in safety analysis workflows, I asked a question nobody was formalizing: does AI assistance actually improve safety analysis, or does it introduce blind spots that only surface after something goes wrong?
The Competence Shadow is my answer. It is the systematic narrowing of human reasoning induced by AI-generated safety analysis: not what the AI shows you, but what it prevents you from considering. The paper develops a formal framework with five dimensions of safety competence (domain knowledge, standards expertise, operational experience, contextual understanding, and judgment), defines four collaboration structures between humans and AI, and proves that the shadow compounds multiplicatively. A small erosion in each dimension can produce catastrophic degradation of the overall analysis.
The central finding: AI assistance in safety engineering is a collaboration design problem, not a software procurement decision. The same tool degrades or improves analysis quality depending entirely on how it is structured into the workflow.
Chapter 8: reasonX Labs
All of this, the formal methods background, the industry experience, the SafetyOps and Competence Shadow frameworks, converged in reasonX Labs, which I co-founded and lead as CEO.
Chapter 9: From Physical AI to Frontier AI Safety
The Competence Shadow paper had asked what happens when AI enters safety workflows for robots and cars. But by early 2026, a bigger version of the same question was forming: what happens when the AI system itself is the thing that needs a safety case?
Frontier AI models are increasingly subject to the same kind of safety assurance demands that I had spent my career applying to physical systems. Governments and AI Safety Institutes were beginning to ask developers to produce safety cases: structured arguments, supported by evidence, that the risk of harm from a model is within acceptable bounds. The concept was borrowed directly from the safety-critical systems world I knew well.
In early 2026, I joined the Arcadia Impact AI Governance Taskforce (Winter 2026 cohort), a selective research fellowship focused on reducing catastrophic risks from advanced AI. As part of the safety cases team, we took Google DeepMind's publicly released safety case for scheming inability and subjected it to independent external review. The resulting paper, Lessons from External Review of DeepMind's Scheming Inability Safety Case (April 2026), surfaced substantive new concerns and provided concrete recommendations for how external review of AI safety cases should be conducted.
The domain shifted from cars and robots to foundation models, but the core discipline remained the same: structured safety arguments need independent scrutiny, and the people reviewing them need the right frameworks to do it well.
Chapter 10: What Comes Next
The arc of this journey follows a path: mathematics, then physics, then cyber-physical systems, then Physical AI, and through all of it, the question of how to make these systems safe.
Fractional calculus taught me that the right formalism can unlock problems that brute-force methods can't touch. Formalizing optics taught me that physics itself can be machine-checked. Hyperproperties taught me that the hardest security guarantees require reasoning about relationships between executions, not just individual runs. Industry taught me that none of this matters if you can't translate it into systems that real people trust with their lives. And the standards work taught me that safety has to be a shared language, not just a private discipline.
I presented some of these ideas at the International Association for Safe and Ethical AI (IASEAI 2025) in a talk titled Safety of AI-Powered Robots: Dimensions, Challenges, and a Way Forward, distilling what I've learned across a decade of practice about what's actually hard about making AI-powered physical systems safe, and what a credible path forward looks like.
The mission hasn't changed since the 1.3th derivative:
Physical AI's societal impact demands trustworthy deployment.
What has changed is the scope. Safety assurance now spans self-driving cars, humanoid robots, mobile robotics, industrial automation, air mobility, and the foundation models that increasingly power all of them.
Right now, I'm deep in the space between Vision-Language Models (VLMs), Vision-Language-Action Models (VLAs), and the reality of deploying them on embedded systems under constrained compute. How do you take these powerful models and run them on hardware that has real thermal limits, real power budgets, and real-time deadlines? And more importantly, how do you make them fault-tolerant by design, so that when something goes wrong (and it will), the system degrades gracefully instead of catastrophically? I believe this is the transformational next step in Physical AI: not just making these models work in the lab, but making them safe and reliable on the actual robots and vehicles that operate in the real world.
If any of this resonates, I'd love to hear from you. Get in touch →
The work continues...