First version of the model card
Browse filesSigned-off-by: Andreas Florath <[email protected]>
README.md
CHANGED
@@ -1,3 +1,131 @@
|
|
1 |
---
|
2 |
license: bigcode-openrail-m
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: bigcode-openrail-m
|
3 |
+
pipeline_tag: text-generation
|
4 |
+
library_name: peft
|
5 |
+
base_model: mistralai/Mistral-7B-v0.1
|
6 |
---
|
7 |
+
# Model Card for CoqLLM-FineTuned-Experiment-Gen0
|
8 |
+
|
9 |
+
This model is an experiment in the realm of formal theorem proving,
|
10 |
+
specifically tailored for generating and interpreting Coq code. By
|
11 |
+
leveraging a comprehensive dataset derived from over 10,000 Coq source
|
12 |
+
files, *CoqLLM-FineTuned-Experiment-Gen0* exhibits an enhanced
|
13 |
+
proficiency in understanding the syntax and semantics unique to Coq,
|
14 |
+
thereby facilitating significant strides in automated theorem proving.
|
15 |
+
|
16 |
+
## Model Details
|
17 |
+
|
18 |
+
### Model Description
|
19 |
+
|
20 |
+
- **Developed by:** Andreas Florath
|
21 |
+
- **Model type:** Fine-tuned Large Language Model
|
22 |
+
- **Finetuned from model:** Mistral-7b (`mistralai/Mistral-7B-v0.1`)
|
23 |
+
- **Language(s) (NLP):** Coq (only)
|
24 |
+
- **License:** bigcode-openrail-m
|
25 |
+
|
26 |
+
## Model Sources
|
27 |
+
|
28 |
+
- **Paper:** "Enhancing Formal Theorem Proving: A Comprehensive
|
29 |
+
Dataset for Training AI Models on Coq Code" by Andreas Florath.
|
30 |
+
|
31 |
+
## Uses
|
32 |
+
|
33 |
+
### Prompt Format
|
34 |
+
|
35 |
+
No special prompt format needed. The model was fine-tuned with Coq
|
36 |
+
source code. Just providing the proposal let's the model generate a
|
37 |
+
proof, like:
|
38 |
+
|
39 |
+
```
|
40 |
+
Lemma plus_n_O : forall n:nat, n = n + 0.
|
41 |
+
```
|
42 |
+
|
43 |
+
No special characters or delimiters are needed.
|
44 |
+
|
45 |
+
### Direct Use
|
46 |
+
|
47 |
+
*CoqLLM-FineTuned-Experiment-Gen0* is an experiment to show the
|
48 |
+
usefulness of the dataset used for fine-tuning the model. The model
|
49 |
+
might be used to check if short proofs can be automatically generated.
|
50 |
+
Another possible use-case is to curate the existing Coq source code
|
51 |
+
and curate and generate new Coq source code.
|
52 |
+
|
53 |
+
### Out-of-Scope Use
|
54 |
+
|
55 |
+
The model is not intended for general-purpose language tasks outside
|
56 |
+
the domain of theorem proving and formal verification. Misuse includes
|
57 |
+
but is not limited to non-Coq programming tasks, natural language
|
58 |
+
processing outside technical documentation, or any form of deployment
|
59 |
+
in critical systems without adequate supervision and validation.
|
60 |
+
|
61 |
+
### Bias, Risks, and Limitations
|
62 |
+
|
63 |
+
The model inherits biases from its base mode, training data, potentially
|
64 |
+
reflecting the diversity or lack thereof in the collected Coq
|
65 |
+
files. Users should be wary of these limitations, particularly when
|
66 |
+
applying the model in new or underserved areas of theorem proving.
|
67 |
+
|
68 |
+
### Recommendations
|
69 |
+
|
70 |
+
To mitigate risks and biases, it's recommended to supplement model use
|
71 |
+
with human oversight or an environment where the generated Coq source
|
72 |
+
code can be automatically verified. Continuous monitoring for
|
73 |
+
unexpected behaviors or outputs is advised, alongside efforts to
|
74 |
+
diversify and expand the training dataset to cover a broader spectrum
|
75 |
+
of Coq use cases.
|
76 |
+
|
77 |
+
## How to Get Started with the Model
|
78 |
+
|
79 |
+
Here is a code snippet using the fine-tuned model. The shown setup
|
80 |
+
should work using GPUs with <= 24GByte RAM. You might want to adapt
|
81 |
+
and experiment with different settings, like different temperatures.
|
82 |
+
|
83 |
+
```python
|
84 |
+
import torch
|
85 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
|
86 |
+
import sys
|
87 |
+
from peft import PeftModel
|
88 |
+
|
89 |
+
base_model_id = "mistralai/Mistral-7B-v0.1"
|
90 |
+
bnb_config = BitsAndBytesConfig(
|
91 |
+
load_in_4bit=True, bnb_4bit_use_double_quant=True,
|
92 |
+
bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16
|
93 |
+
)
|
94 |
+
|
95 |
+
base_model = AutoModelForCausalLM.from_pretrained(
|
96 |
+
base_model_id, quantization_config=bnb_config,
|
97 |
+
device_map="auto", trust_remote_code=True,
|
98 |
+
)
|
99 |
+
|
100 |
+
eval_tokenizer = AutoTokenizer.from_pretrained(
|
101 |
+
base_model_id, add_bos_token=True, trust_remote_code=True)
|
102 |
+
|
103 |
+
ft_model = PeftModel.from_pretrained(
|
104 |
+
base_model, "florath/CoqLLM-FineTuned-Experiment-Gen0")
|
105 |
+
|
106 |
+
eval_prompt = "Lemma plus_n_O : forall n:nat, n = n + 0."
|
107 |
+
|
108 |
+
ft_model.eval()
|
109 |
+
with torch.no_grad():
|
110 |
+
|
111 |
+
for idx in range(10):
|
112 |
+
|
113 |
+
res = eval_tokenizer.decode(
|
114 |
+
ft_model.generate(
|
115 |
+
**model_input,
|
116 |
+
max_new_tokens=50,
|
117 |
+
do_sample=True,
|
118 |
+
temperature=0.4,
|
119 |
+
pad_token_id=eval_tokenizer.eos_token_id,
|
120 |
+
repetition_penalty=1.15)[0], skip_special_tokens=False)
|
121 |
+
|
122 |
+
print("Result [%2d] [%s]" % (idx, res))
|
123 |
+
|
124 |
+
```
|
125 |
+
|
126 |
+
## Training Details
|
127 |
+
|
128 |
+
The model was fine-tuned with the `florath/coq-facts-props-proofs-v1`
|
129 |
+
dataset. Only entries with permissive licenses were used during the
|
130 |
+
fine-tuning process.
|
131 |
+
|