We can only know that a set of axioms is accurate by examining the inferences that follow from them to see if any are false. (Of course we can't examine all inferences except in trivial cases, but systematic searches for unanticipated inferences is central to the QA of any ontology in which inference plays a significant role.)
I have watched top logicians spend hours trying to understand the reasoning that led to an obviously false inference from what seemed an obviously correct set of axioms, even with the help of automatic theorem provers, justification finders, etc.
Add to this the difficulties of axioms derived from work by domain experts, no matter how clever the tools, and there is more than ample opportunity for incorrect inferences from apparently correct axioms.
If we are going to use logic, then we have to accept that logical inference and precision are not natural to human users, and that we have to debug the resulting inferences just as we have to debug the performance that results from seemingly correct programs.
Regards
Alan