Large Language Models¶

LLM-only learners leverage the power of large language models to perform ontology learning tasks without using retrieval components. This approach is particularly useful when you want to rely on the model’s inherent knowledge rather than specific examples from the training data.

Loading Ontological Data¶

We start by importing necessary components from the ontolearner package, loading ontology, and doing train-test splits.

from ontolearner import AutoLLMLearner, AgrO, train_test_split, LabelMapper, StandardizedPrompting, evaluation_report

ontology = AgrO()

ontology.load()

ontological_data = ontology.extract()

train_data, test_data = train_test_split(ontological_data, test_size=0.2, random_state=42)

Note

AutoLLMLearner: A wrapper class to easily configure and run LLM-based learners.
LabelMapper: Maps generated outputs to specified clases.
StandardizedPrompting: A default prompting strategy for prompting LLMs in a consistent way.
evaluation_report: A evaluation method for LLMs4OL tasks.

Initialize Learner¶

Before defining the LLM learner, choose the task you want the LLM to perform. Available tasks has been described in LLMs4OL Paradigms. The task IDs are: ‘term-typing’, ‘taxonomy-discovery’, ‘non-taxonomic-re’.

task = 'non-taxonomic-re'

Next, to use LLMs hosted on HuggingFace or other providers that require token, provide a valid access token:

token = '...'

Setup the learner with your prompting and label mapping strategies and then load the desired model:

llm_learner = AutoLLMLearner(
    prompting=StandardizedPrompting,
    label_mapper=LabelMapper(),
    token=token
)
llm_learner.load(model_id='Qwen/Qwen2.5-0.5B-Instruct')

Next, .fit the model and make the predictions:

llm_learner.fit(train_data, task=task)

predicts = llm_learner.predict(test_data, task=task)

truth = llm_learner.tasks_ground_truth_former(data=test_data, task=task)

metrics = evaluation_report(y_true=truth, y_pred=predicts, task=task)

print(metrics)

You will see a evaluations results.

Hint

OntoLearner supports various LLM models, including (but not limited to):

Mistral models (e.g., “mistralai/Mistral-7B-Instruct-v0.1”)
Llama models (e.g., “meta-llama/Llama-3.1-8B-Instruct”)
Qwen models (e.g., “Qwen/Qwen3-0.6B”)
DeepSeek models (e.g., “deepseek-ai/deepseek-llm-7b-base”)
…

Pipeline Usage¶

The OntoLearner package also offers a streamlined LearnerPipeline class that simplifies initialization, training, prediction, and evaluation into a single call. In this section, we run the pipeline in LLM-only mode by setting llm_id only.

# Import the main components from the OntoLearner library
from ontolearner import LearnerPipeline, AgrO, train_test_split

# Load the AgrO ontology, which contains agricultural concepts and relationships
ontology = AgrO()
ontology.load()  # Parse and initialize internal ontology structures, including term-type pairs

# Extract annotated examples (terms and their types), and split into train/test sets
train_data, test_data = train_test_split(
    ontology.extract(),     # Extract raw (term, types) instances from the ontology
    test_size=0.2,          # 20% of the data is reserved for evaluation
    random_state=42         # Ensure reproducibility by setting a fixed seed
)

# Set up the learner pipeline using a lightweight instruction-tuned LLM
pipeline = LearnerPipeline(
    llm_id='Qwen/Qwen2.5-0.5B-Instruct',   # LLM-only mode
    hf_token='...',                        # Hugging Face access token for loading gated models
    batch_size=32                          # Batch size for parallel inference (if applicable)
)

# Run the full learning pipeline on the term-typing task
outputs = pipeline(
    train_data=train_data,
    test_data=test_data,
    evaluate=True,               # Enables automatic computation of precision, recall, F1
    task='term-typing'           # The task is to classify terms into semantic types
)

# Display the evaluation results
print("Metrics:", outputs['metrics'])          # Shows {'precision': ..., 'recall': ..., 'f1_score': ...}

# Display total elapsed time for training + prediction + evaluation
print("Elapsed time:", outputs['elapsed_time'])

# Print all returned outputs (include predictions)
print(outputs)

Custom AutoLLM¶

OntoLearner provides a default AutoLLM wrapper for handling popular model families (Mistral, Llama, Qwen, etc.) through HuggingFace or external providers. However, in some cases you may want to integrate a model family that is not natively supported (e.g., Falcon, DeepSeek, or a proprietary LLM).

For this, you can extend the AutoLLM class and implement the required load and generate methods. Basic requirements are:

Inherit from AutoLLM
Implement load(model_id), if your model loading is different (for example mistralai/Mistral-Small-3.2-24B-Instruct-2506 uses different type of loading)
Implement generate(inputs, max_new_tokens) to encodes prompts, performs generation, decodes outputs, and maps them to labels.

Falcon-H

The following example shows how to build a Falcon integration:

from ontolearner import AutoLLM
from typing import List
import torch

class FalconLLM(AutoLLM):

    @torch.no_grad()
    def generate(self, inputs: List[str], max_new_tokens: int = 50) -> List[str]:
        encoded_inputs = self.tokenizer(
            inputs,
            return_tensors="pt",
            padding=True,
            truncation=True
        ).to(self.model.device)

        input_ids = encoded_inputs["input_ids"]
        input_length = input_ids.shape[1]

        outputs = self.model.generate(
            input_ids,
            max_new_tokens=max_new_tokens,
            pad_token_id=self.tokenizer.eos_token_id
        )

        generated_tokens = outputs[:, input_length:]
        decoded_outputs = [
            self.tokenizer.decode(g, skip_special_tokens=True).strip()
            for g in generated_tokens
        ]

        return self.label_mapper.predict(decoded_outputs)

Mistral-Small

For Mistral, you can integrate the official mistral-common tokenizer and chat completion interface:

from ontolearner import AutoLLM
from typing import List
import torch

class MistralLLM(AutoLLM):

    def load(self, model_id: str) -> None:
        from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
        from mistral_common.models.modeling_mistral import Mistral3ForConditionalGeneration

        self.tokenizer = MistralTokenizer.from_hf_hub(model_id)

        device_map = "cpu" if self.device == "cpu" else "balanced"
        self.model = Mistral3ForConditionalGeneration.from_pretrained(
            model_id,
            device_map=device_map,
            torch_dtype=torch.bfloat16,
            token=self.token
        )

        if not hasattr(self.tokenizer, "pad_token_id") or self.tokenizer.pad_token_id is None:
            self.tokenizer.pad_token_id = self.model.generation_config.eos_token_id

        self.label_mapper.fit()

        @torch.no_grad()
    def generate(self, inputs: List[str], max_new_tokens: int = 50) -> List[str]:
        from mistral_common.protocol.instruct.messages import ChatCompletionRequest

        tokenized_list = []
        for prompt in inputs:
            messages = [{"role": "user", "content": [{"type": "text", "text": prompt}]}]
            tokenized = self.tokenizer.encode_chat_completion(ChatCompletionRequest(messages=messages))
            tokenized_list.append(tokenized.tokens)

        # Pad inputs and create attention masks
        max_len = max(len(tokens) for tokens in tokenized_list)
        input_ids, attention_masks = [], []
        for tokens in tokenized_list:
            pad_length = max_len - len(tokens)
            input_ids.append(tokens + [self.tokenizer.pad_token_id] * pad_length)
            attention_masks.append([1] * len(tokens) + [0] * pad_length)

        input_ids = torch.tensor(input_ids).to(self.model.device)
        attention_masks = torch.tensor(attention_masks).to(self.model.device)

        outputs = self.model.generate(
            input_ids=input_ids,
            attention_mask=attention_masks,
            eos_token_id=self.model.generation_config.eos_token_id,
            pad_token_id=self.tokenizer.pad_token_id,
            max_new_tokens=max_new_tokens,
        )

        decoded_outputs = []
        for i, tokens in enumerate(outputs):
            output_text = self.tokenizer.decode(tokens[len(tokenized_list[i]):])
            decoded_outputs.append(output_text)

        return self.label_mapper.predict(decoded_outputs)

Logit LLM

The following example shows how the logit-based probability calculation is happening in the OntoLearner to reduce the experimentation time and efficiency:

Hint

To use Mistral LLM in a logit-based approach please use the LogitMistralLLM class.
Also you can use quantized variant of logit-based approach by calling LogitQuantLLM class.

class LogitAutoLLM(AutoLLM):
    def _get_label_token_ids(self):
        label_token_ids = {}
        for label, words in self.label_mapper.label_dict.items():
            ids = []
            for w in words:
                token_ids = self.tokenizer.encode(w, add_special_tokens=False)
                ids.append(token_ids)
            label_token_ids[label] = ids
        return label_token_ids

    def load(self, model_id: str) -> None:
        super().load(model_id)
        self.label_token_ids = self._get_label_token_ids()

    @torch.no_grad()
    def generate(self, inputs: List[str], max_new_tokens: int = 1) -> List[str]:
        encoded = self.tokenizer(inputs, return_tensors="pt", truncation=True, padding=True).to(self.model.device)
        outputs = self.model(**encoded)
        logits = outputs.logits # logits: [batch, seq_len, vocab]
        last_logits = logits[:, -1, :]  # [batch, vocab] # we only care about the NEXT token prediction
        probs = F.softmax(last_logits, dim=-1)
        predictions = []
        for i in range(probs.size(0)):
            label_scores = {}
            for label, token_id_lists in self.label_token_ids.items():
                score = 0.0
                for token_ids in token_id_lists:
                    if len(token_ids) == 1:
                        score += probs[i, token_ids[0]].item()
                    else:
                        score += probs[i, token_ids[0]].item() # multi-token fallback (rare but safe)
                label_scores[label] = score
            predictions.append(max(label_scores, key=label_scores.get))
        return predictions

Qwen3-Thinking LLM

The thinking model of Qwen3 requires a different way of inference, similar to Mistral LLM. The following example shows how to use such model within the OntoLearner. You only need to import QwenThinkingLLM class and use it.

class QwenThinkingLLM(AutoLLM):
    @torch.no_grad()
    def generate(self, inputs: List[str], max_new_tokens: int = 50) -> List[str]:
        messages = [[{"role": "user", "content": prompt + " Please show your final response with 'answer': 'label'."}]
                    for prompt in inputs]
        texts = self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
        encoded_inputs = self.tokenizer(texts, return_tensors="pt", padding=True).to(self.model.device)
        generated_ids = self.model.generate(**encoded_inputs, max_new_tokens=max_new_tokens)
        decoded_outputs = []
        for i in range(len(generated_ids)):
            prompt_len = encoded_inputs.attention_mask[i].sum().item()
            output_ids = generated_ids[i][prompt_len:].tolist()
            try:
                end = len(output_ids) - output_ids[::-1].index(151668)
                thinking_ids = output_ids[:end]
            except ValueError:
                thinking_ids = output_ids
            thinking_content = self.tokenizer.decode(thinking_ids, skip_special_tokens=True).strip()
            decoded_outputs.append(thinking_content)
        return self.label_mapper.predict(decoded_outputs)

Qwen3-Instruct LLM

Similar to the thinking model of Qwen3, the instruct variant also requires a different way of inference. The following example shows how to use such model within the OntoLearner. You only need to import QwenInstructLLM class and use it.

class QwenInstructLLM(AutoLLM):

    def generate(self, inputs: List[str], max_new_tokens: int = 50) -> List[str]:
        messages = [[{"role": "user", "content": prompt + " Please show your final response with 'answer': 'label'."}]
                    for prompt in inputs]

        texts = self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

        encoded_inputs = self.tokenizer(texts, return_tensors="pt", padding="max_length", truncation=True,
                                        max_length=256).to(self.model.device)

        generated_ids = self.model.generate(**encoded_inputs,
                                            max_new_tokens=max_new_tokens,
                                            use_cache=False,
                                            pad_token_id=self.tokenizer.pad_token_id,
                                            eos_token_id=self.tokenizer.eos_token_id)
        decoded_outputs = []
        for i in range(len(generated_ids)):
            prompt_len = encoded_inputs.attention_mask[i].sum().item()
            output_ids = generated_ids[i][prompt_len:].tolist()
            output_content = self.tokenizer.decode(output_ids, skip_special_tokens=True).strip()
            decoded_outputs.append(output_content)
        return self.label_mapper.predict(decoded_outputs)

Once your custom class is defined, you can pass it into AutoLLMLearner:

from ontolearner import AutoLLMLearner, LabelMapper, StandardizedPrompting

falcon_learner = AutoLLMLearner(
    prompting=StandardizedPrompting,
    label_mapper=LabelMapper(),
    llm=FalconLLM,      # 👈 plug in custom Falcon
    token="...",
    device="cuda"
)

falcon_learner.llm.load(model_id="tiiuae/Falcon-H1-1.5B-Deep-Instruct")

# Train and evaluate
falcon_learner.fit(train_data, task="term-typing")
predictions = falcon_learner.predict(test_data, task="term-typing")

print(predictions)

The following models are specialized within the OntoLearner:

To use mistralai/Mistral-Small-3.2-24B-Instruct-2506 you can use MistralLLM instead of AutoLLM.
To use Falcon-H series of LLMs (e.g. tiiuae/Falcon-H1-1.5B-Deep-Instruct you can FalconLLM instead of AutoLLM.

Note

You can implement as many custom AutoLLM classes as needed (e.g., for proprietary APIs, local models, or new HF releases). As long as they subclass AutoLLM and implement load + generate, they will work seamlessly with AutoLLMLearner.

Hint

See Learning Tasks for possible tasks within Learners.