nielsr HF staff commited on
Commit
c3a5f8c
1 Parent(s): a712fe6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +104 -1
README.md CHANGED
@@ -42,4 +42,107 @@ fine-tuned versions on a task that interests you.
42
 
43
  ### How to use
44
 
45
- For code examples, we refer to the [documentation](https://huggingface.co/docs/transformers/main/en/model_doc/blip-2#transformers.Blip2ForConditionalGeneration.forward.example).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  ### How to use
44
 
45
+ For code examples, we refer to the [documentation](https://huggingface.co/docs/transformers/main/en/model_doc/blip-2#transformers.Blip2ForConditionalGeneration.forward.example).
46
+
47
+ #### Running the model on CPU
48
+
49
+ <details>
50
+ <summary> Click to expand </summary>
51
+
52
+ ```python
53
+ import requests
54
+ from PIL import Image
55
+ from transformers import BlipProcessor, Blip2ForConditionalGeneration
56
+
57
+ processor = BlipProcessor.from_pretrained("Salesforce/blip2-flan-t5-xxl")
58
+ model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-flan-t5-xxl")
59
+
60
+ img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
61
+ raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
62
+
63
+ question = "how many dogs are in the picture?"
64
+ inputs = processor(raw_image, question, return_tensors="pt")
65
+
66
+ out = model.generate(**inputs)
67
+ print(processor.decode(out[0], skip_special_tokens=True))
68
+ ```
69
+ </details>
70
+
71
+ #### Running the model on GPU
72
+
73
+ ##### In full precision
74
+
75
+ <details>
76
+ <summary> Click to expand </summary>
77
+
78
+ ```python
79
+ # pip install accelerate
80
+ import requests
81
+ from PIL import Image
82
+ from transformers import Blip2Processor, Blip2ForConditionalGeneration
83
+
84
+ processor = Blip2Processor.from_pretrained("Salesforce/blip2-flan-t5-xxl")
85
+ model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-flan-t5-xxl", device_map="auto")
86
+
87
+ img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
88
+ raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
89
+
90
+ question = "how many dogs are in the picture?"
91
+ inputs = processor(raw_image, question, return_tensors="pt").to("cuda")
92
+
93
+ out = model.generate(**inputs)
94
+ print(processor.decode(out[0], skip_special_tokens=True))
95
+ ```
96
+ </details>
97
+
98
+ ##### In half precision (`float16`)
99
+
100
+ <details>
101
+ <summary> Click to expand </summary>
102
+
103
+ ```python
104
+ # pip install accelerate
105
+ import torch
106
+ import requests
107
+ from PIL import Image
108
+ from transformers import Blip2Processor, Blip2ForConditionalGeneration
109
+
110
+ processor = Blip2Processor.from_pretrained("Salesforce/blip2-flan-t5-xxl")
111
+ model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-flan-t5-xxl", torch_dtype=torch.float16, device_map="auto")
112
+
113
+ img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
114
+ raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
115
+
116
+ question = "how many dogs are in the picture?"
117
+ inputs = processor(raw_image, question, return_tensors="pt").to("cuda", torch.float16)
118
+
119
+ out = model.generate(**inputs)
120
+ print(processor.decode(out[0], skip_special_tokens=True))
121
+ ```
122
+ </details>
123
+
124
+ ##### In 8-bit precision (`int8`)
125
+
126
+ <details>
127
+ <summary> Click to expand </summary>
128
+
129
+ ```python
130
+ # pip install accelerate bitsandbytes
131
+ import torch
132
+ import requests
133
+ from PIL import Image
134
+ from transformers import Blip2Processor, Blip2ForConditionalGeneration
135
+
136
+ processor = Blip2Processor.from_pretrained("Salesforce/blip2-flan-t5-xxl")
137
+ model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-flan-t5-xxl", load_in_8bit=True, device_map="auto")
138
+
139
+ img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
140
+ raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
141
+
142
+ question = "how many dogs are in the picture?"
143
+ inputs = processor(raw_image, question, return_tensors="pt").to("cuda", torch.float16)
144
+
145
+ out = model.generate(**inputs)
146
+ print(processor.decode(out[0], skip_special_tokens=True))
147
+ ```
148
+ </details>