Wiki Edit Blocks
Wiki Edit Blocks
  • Introduction
  • ▶️Getting started
    • Preparation
    • Download dataset from HF
    • Checkout and get your data
    • Modify dataset for your needs
    • Use data with limited resource
    • Distribute your dataset
    • Run on Slurm cluster
  • 🚧Advanced usages
    • Build from source
  • 🛠️External Resources
    • BloArk Documentation
    • BloArk GitHub Repo
    • WikiDL Documentation
    • Grimm Documentation
    • Ergodiff Documentation
Powered by GitBook
On this page

Was this helpful?

  1. Getting started

Use data with limited resource

(Document WIP) This instruction talks about how you should use the data when having limited resource available (e.g. limited memory, limited storage space).

Intuition

Revision-based dataset could be extremely huge. For example, the basic Wikipedia Edit History dump is around 25TB when decompressed. So it is really unfriendly to decompress all warehouses at once and use them for training.

PreviousModify dataset for your needsNextDistribute your dataset

Last updated 1 year ago

Was this helpful?

▶️