Mihenk — Turkish LLM Evaluation, by Clastrum

MihenkTurkish LLM evaluation, in the open.

Mihenk is Clastrum's open-source Turkish-LLM evaluation suite. It provides standardized benchmarks for assessing Turkish-language large language models across reasoning, instruction-following, and domain accuracy, so enterprise teams can choose a model on evidence rather than on marketing.

Mihenk is Clastrum's evaluation suite — not the unrelated Mihenk-LLM model family on Hugging Face. The name is a Turkish word for touchstone: the standard you test gold against.

What Mihenk measures

Mihenk benchmarks Turkish-language LLMs across three axes: reasoning (multi-step logical and mathematical tasks in Turkish), instruction-following (alignment between the prompt and the response, calibrated for Turkish syntactic patterns), and domain accuracy (finance, legal, and medical terminology and usage). Results are reported as normalized scores per axis so teams can compare models on the dimensions that matter for their use case.

How to run Mihenk

Mihenk is open-source and available on GitHub at github.com/clastrum/mihenk. The evaluation suite runs locally against any model accessible via an OpenAI-compatible API or a local inference server. Full instructions, dataset files, and scoring scripts are in the repository README.

Why Turkish-specific evaluation matters

Global English leaderboards — MMLU, HumanEval, GSM8K — do not reflect performance on Turkish text. Turkish is morphologically rich and agglutinative; a model that scores well in English may degrade significantly on Turkish reasoning or instruction tasks. Mihenk closes that gap with benchmarks built and scored in Turkish from the ground up.

How Mihenk differs from TR-MMLU and Cetvel

TR-MMLU is a direct translation of the English MMLU dataset — useful for coverage but limited by translation artifacts and missing Turkish-specific domain content. Cetvel focuses on Turkish language correctness. Mihenk is built for enterprise decision-making: its domain accuracy axis covers finance, legal, and regulated-sector tasks that determine real procurement decisions. The three are complementary, not competing.

View on GitHub
Frequently asked questions
What is Mihenk?
Mihenk is Clastrum's open-source Turkish-LLM evaluation suite. It benchmarks Turkish-language large language models across reasoning, instruction-following, and domain accuracy, giving enterprise teams an objective basis for choosing a model in Turkish rather than relying on global English leaderboards.
Is Mihenk the same as Mihenk-LLM?
No. Mihenk (this project) is Clastrum's open-source evaluation suite for Turkish LLMs, hosted at github.com/clastrum/mihenk. The unrelated Mihenk-LLM is a separate model family on Hugging Face with no connection to Clastrum.
How is Mihenk licensed?
Mihenk is released under an open-source license. See the repository at github.com/clastrum/mihenk for the current license terms, dataset credits, and citation guidance.

Start a project → Start a brief with Clastrum

Last updated: June 2026.