Skip to content

go-air/dupi

Repository files navigation

⊧ dupi

Dupi is an engine for identifying and exploring duplicative text in sets of documents.

Status

Dupi is in alpha/early beta development stage. Please feel free to give it a try (and file issues). We have run it on several document sets successfully, but it definitely needs more testing.

Input

Throw hundreds of thousands of textual documents at it. Or extract text from other documents and send that to dupi.

Output

Find and query for repeated chunks of text.

Tutorial

Tutorial

Design

Design Document

Library Reference

Go Reference

About

A tool to find all duplicates in large sets of text documents.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published

Languages