hlky

hlky

AI & ML interests

DATA

Articles

Organizations

hlky's activity

posted an update 25 days ago
replied to their post 26 days ago
view reply

Thanks that's helpful. Currently the title and description are not ideal for this kind of filtering. We'd need all the images captioned and classes/categories extracting from the captions. Captioning the set is something that's planned, I'm building https://github.com/bigdata-pw/florence-tool for that purpose. Another (very exciting) project is my priority right now but I will aim to get an initial version of this UI out soon focusing on image datasets like Flickr with a gallery type view for quick review and selection, plus filtering options, however as mentioned the usefulness of text based filtering will be limited until captions/classes are available, still it will be useful to filter on available image resolutions, view count (popularity) etc.
For reference the image sizes (url_n, url_w, url_m etc.) are documented here https://www.flickr.com/services/api/misc.urls.html

replied to their post 27 days ago
view reply

Sure, it's a fun and useful project. I've made a start already with some of the basic features. If you could tell me more about how you're expecting it to work and what the user interface should be like that would help refine it.

replied to their post 27 days ago
replied to their post 27 days ago
replied to their post 29 days ago
view reply

Please refrain from advertising your service on my post, thanks!

posted an update 29 days ago
view post
Post
2123
BIG update dropped for bigdata-pw/Flickr - now ~515M images! Target for the next update: 1B

In case you missed them; other recent drops include bigdata-pw/Dinosaurs - a small set of BIG creatures ๐Ÿฆ•๐Ÿฆ– and the first in a series of articles about the art of web scraping! https://huggingface.co/blog/hlky/web-scraping-101 https://huggingface.co/blog/hlky/web-scraping-102

Stay tuned for exciting datasets and models coming soon:
- PC and Console game screenshots
- TV/Film actors biographies and photos (think facial recognition and automatic captioning!)
- bigdata-pw/lyrics-gpt v2
- and more!
  • 11 replies
ยท
posted an update about 1 month ago
view post
Post
1904
Announcing another BIG data drop! This time it's ~275M images from Flickr bigdata-pw/Flickr

Data acquisition for this project is still in progress, get ready for an update soon:tm:

In case you missed them; other BIG data drops include Diffusion1B bigdata-pw/Diffusion1B - ~1.23B images and generation parameters from a variety of diffusion models and if you fancy practicing diffusion model training check out Dataception bigdata-pw/Dataception - a dataset of over 5000 datasets in WebDataset format!

Requests are always welcome so reach out if there's a dataset you'd like to see!
  • 1 reply
ยท