Python
from bloark import BloArk # Initialize BloArk with your dataset bloark = BloArk(data_path="./your-dataset") # Process and access your data for block in bloark: # Your processing logic here pass
Wikipedia Edit Blocks is a high-efficiency snapshot of Wikipedia edit history, packaged as reusable blocks for research. This documentation shows how to go from the raw dataset to analysis-ready structures with as little custom plumbing as possible.
All examples in this documentation use the BloArk data architecture. BloArk is optimized for revision-based datasets and is designed to make indexing and processing large histories fast and predictable.
With BloArk, you can:
If you have questions about this documentation, the Wikipedia Edit Blocks dataset, or BloArk itself, feel free to reach out to the maintainers listed on the project website.
© 2025 Lingxi Li.
San FranciscoSF