Ambar: Libre Document Search Engine for Office, Text and PDF Documents
Ambar is an open-source document search engine with automated crawling, OCR, tagging and instant full-text search.
Ambar defines a new way to implement full-text document search into your workflow.
- Easily deploy Ambar with a single
docker-compose
file - Perform Google-like search through your documents and contents of your images
- Tag your documents
- Use a simple REST API to integrate Ambar into your workflow
Search
Tutorial: Mastering Ambar Search Queries
- Fuzzy Search (John~3)
- Phrase Search ("John Smith")
- Search By Author (author:John)
- Search By File Path (filename:*.txt)
- Search By Date (when: yesterday, today, lastweek, etc)
- Search By Size (size>1M)
- Search By Tags (tags:ocr)
- Search As You Type
- Supported language analyzers: English
ambar_en
, Russianambar_ru
, Germanambar_de
, Italianambar_it
, Polishambar_pl
, Chineseambar_cn
, CJKambar_cjk
Crawling
Ambar 2.0 only supports local FS crawling, if you need to crawl an SMB share of an FTP location - just mount it using standard Linux tools. Crawling is automatic, no schedule is needed due to crawlers monitor file system events and automatically process new, changed and removed files.
Content Extraction
Ambar supports large files (>30MB)
Supported file types:
- ZIP archives
- Mail archives (PST)
- MS Office documents (Word, Excel, PowerPoint, Visio, Publisher)
- OCR over images
- Email messages with attachments
- Adobe PDF (with OCR)
- OCR languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld
- OpenOffice documents
- RTF, Plaintext
- HTML / XHTML
- Multithread processing
License
Ambar is released under the MIT License.