Scrapping

Harvest News Like a Pro: Introducing News-Please, Your Open-Source Solution for News Extraction and Archiving

Hazem Abbas

Mar 21, 2024 — 1 min read

Photo by Obi - @pixel8propix / Unsplash

Table of Content

news-please is an open-source news crawler that extracts structured information from news websites. It uses libraries like scrapy, Newspaper, and readability, and can follow internal hyperlinks and read RSS feeds to fetch both recent and archived articles.

It also features a library mode for Python developers and can extract articles from the large news archive at commoncrawl.org.

Features

works out of the box: install with pip, add URLs of your pages, run
run news-please conveniently using its CLI mode
use it as a library within your own software
extract articles from commoncrawl.org's news archive
stores extracted results in JSON files, PostgreSQL, ElasticSearch, or your own storage
simple but extensive configuration (if you want to tweak the results)
revisions: crawl articles multiple times and track changes
crawl and extract information given a list of article URLs
to use news-please within your own Python code

Extracted information

news-please extracts the following attributes from news articles. An examplary json file as extracted by news-please can be found here.

headline
lead paragraph
main text
main image
name(s) of author(s)
publication date
language

Install

$ pip3 install news-please

License

Apache-2.0 License

Resources & Downloads

Source-code download

Scrapping data engineering Data Mining databases data science News news reader Open-source Apache License

Harvest News Like a Pro: Introducing News-Please, Your Open-Source Solution for News Extraction and Archiving

Hazem Abbas

Table of Content

Features

Extracted information

Install

License

Resources & Downloads

Are You Truly Ready to Put Your Mobile or Web App to the Test?

Articles

Systems

Development

Apps

Science - Healthcare

Open-source Apps

Medical Apps

Lists

Dev. Resources

Read more

10 Reasons Why Web and Marketing Agencies Should Hire A ComfyUI Expert?

Doctor's Guide to GenAI: Which Tools to Use and How to Use Them Wisely!

AI Isn’t Ready to Fire Your Developers (Yet); Lessons from a Friend’s Mistake

Top 14 Open-source MTA (Message/ Mail Transfer Agent) for Enterprise and Agencies

Table of Content

Features

Extracted information

Install

License

Resources & Downloads

Read More Articles in Scrapping

Automa Makes Your Browser Smart and Automated for Firefox and Google Chrome (Totally Free)

CrawleeAI: Transforming Web Scraping with AI into Intelligent Data Symphony

Why Data Geeks Love These 16 Free AI Scraping Solutions

20 Open-Source Free Proxy Server Apps to Take Control of Your Network / Internet Traffic

Revolutionizing Healthcare: The Impact of Python in Bioinformatics, Medicine, and AI Integration, 18 Libraries and Projects

From Django to Spider: Implementing Scrapy in Your Web Application

Articles

Systems

Development

Apps

Science - Healthcare

Open-source Apps

Medical Apps

Lists

Dev. Resources

Read more

10 Reasons Why Web and Marketing Agencies Should Hire A ComfyUI Expert?

Doctor's Guide to GenAI: Which Tools to Use and How to Use Them Wisely!

AI Isn’t Ready to Fire Your Developers (Yet); Lessons from a Friend’s Mistake

Top 14 Open-source MTA (Message/ Mail Transfer Agent) for Enterprise and Agencies