Document Manager

Ambar: Libre Document Search Engine for Office, Text and PDF Documents

Hazem Abbas

Dec 10, 2022 — 1 min read

Photo by Jametlene Reskp / Unsplash

Table of Content

Ambar is an open-source document search engine with automated crawling, OCR, tagging and instant full-text search.

Ambar defines a new way to implement full-text document search into your workflow.

Easily deploy Ambar with a single docker-compose file
Perform Google-like search through your documents and contents of your images
Tag your documents
Use a simple REST API to integrate Ambar into your workflow

Search

Tutorial: Mastering Ambar Search Queries

Fuzzy Search (John~3)
Phrase Search ("John Smith")
Search By Author (author:John)
Search By File Path (filename:*.txt)
Search By Date (when: yesterday, today, lastweek, etc)
Search By Size (size>1M)
Search By Tags (tags:ocr)
Search As You Type
Supported language analyzers: English ambar_en, Russian ambar_ru, German ambar_de, Italian ambar_it, Polish ambar_pl, Chinese ambar_cn, CJK ambar_cjk

Crawling

Ambar 2.0 only supports local FS crawling, if you need to crawl an SMB share of an FTP location - just mount it using standard Linux tools. Crawling is automatic, no schedule is needed due to crawlers monitor file system events and automatically process new, changed and removed files.

Content Extraction

Ambar supports large files (>30MB)

Supported file types:

ZIP archives
Mail archives (PST)
MS Office documents (Word, Excel, PowerPoint, Visio, Publisher)
OCR over images
Email messages with attachments
Adobe PDF (with OCR)
OCR languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld
OpenOffice documents
RTF, Plaintext
HTML / XHTML
Multithread processing

License

Ambar is released under the MIT License.

Resources

https://github.com/RD17/ambar

Document Manager search-engine Open-source MIT Web-based Apps web development Developer Tools

Ambar: Libre Document Search Engine for Office, Text and PDF Documents

Hazem Abbas

Table of Content

Search

Crawling

Content Extraction

License

Resources

Are You Truly Ready to Put Your Mobile or Web App to the Test?

Articles

Systems

Development

Apps

Science - Healthcare

Open-source Apps

Medical Apps

Lists

Dev. Resources

Read more

10 Reasons Why Web and Marketing Agencies Should Hire A ComfyUI Expert?

Doctor's Guide to GenAI: Which Tools to Use and How to Use Them Wisely!

AI Isn’t Ready to Fire Your Developers (Yet); Lessons from a Friend’s Mistake

Top 14 Open-source MTA (Message/ Mail Transfer Agent) for Enterprise and Agencies

Table of Content

Search

Crawling

Content Extraction

License

Resources

Read More Articles in Document Manager

Transform Your Documents Securely with PD3F.com - The Free Self-hosted PDF OCR Text Extractor

AnyTXT Searcher: Your Free Personal, Super-Fast File Content Search Engine with OCR for Windows and Linux Systems

Fumadocs: Create Stunning Documentation Websites in Minutes with This Free Self-hosted Tool

Documenso: Solution Revolutionize Your Document Management and Free Alternative to DocuSign

Top 14 Lightweight Document Management Systems for Freelancers, Agencies, and Startups

Mrdoc is a Libre Self-hosted Document Management System

Articles

Systems

Development

Apps

Science - Healthcare

Open-source Apps

Medical Apps

Lists

Dev. Resources

Read more

10 Reasons Why Web and Marketing Agencies Should Hire A ComfyUI Expert?

Doctor's Guide to GenAI: Which Tools to Use and How to Use Them Wisely!

AI Isn’t Ready to Fire Your Developers (Yet); Lessons from a Friend’s Mistake

Top 14 Open-source MTA (Message/ Mail Transfer Agent) for Enterprise and Agencies