Large language models (LLMs) are increasingly applied to scientific question answering, yet their outputs often contain statements lacking explicit evidence. In high-stakes domains such as biomedicine, ensuring traceability to source documents is essential for interpretability and reliability. While retrieval-augmented generation (RAG) systems leverage external documents, most pipelines do not strictly enforce evidence dependence during answer construction.
We introduce Citation-Driven Extractive Claim Verification (CD-ECV), a deterministic, non-generative extractive framework for scientific claim verification — not a hallucination-reduction system for generative LLMs. D-ECV retrieves biomedical literature via sparse lexical ranking BM25), applies sentence-level lexical and semantic filtering, and constructs responses exclusively from unmodified evidence spans. Importantly, CD-ECV guarantees source traceability, not factual truth: if retrieved evidence is incorrect or outdated, outputs remain traceable but may not reflect scientific ground truth.
We evaluate CD-ECV on the SciFact benchmark, (300 claims labelled SUPPORT, CONTRADICT, or NOT_ENOUGH_INFO), using a corpus of 5183 biomedical passages. Metrics include retrieval recall, evidence selection precision, label accuracy, and abstention rate. All non-abstaining outputs consist entirely of verbatim retrieved evidence spans. These results establish CD-ECV as a deterministic, non-generative extractive baseline for citation-grounded scientific claim verification, providing a transparent and reproducible reference point and enabling future integration with neural validation or generative components.