Hey there! π I recently worked on a small project where I created a simple web app that lets you search for Wikipedia articles and display them in a chat-like interface. I used Streamlit to build the app and BeautifulSoup for web scraping. I wanted to share how I did it so you can try it out too!
What You Need
Before we dive in, make sure you have these Python libraries installed:
-
streamlit
: To build the web app. -
requests
: To send requests to websites and get data. -
beautifulsoup4
: To scrape and parse the HTML content.
You can install them using pip:
pip install streamlit requests beautifulsoup4
The Code Explained
1. Setting Up the App
First, I imported the necessary libraries and set up the basic configuration for the Streamlit app.
import streamlit as st
import requests
from bs4 import BeautifulSoup
import time
import random
st.set_page_config(page_title="WikiStream", page_icon="βΉ")
st.title("Wiki-Fetch")
st.sidebar.title("Options")
2. Adding Themes and Chat Interface
I added an option in the sidebar for users to switch between Light and Dark themes. I also set up a basic chat interface where the user can enter a topic and see the responses.
theme = st.sidebar.selectbox("Choose a theme", ["Light", "Dark"])
if theme == "Dark":
st.markdown("""
<style>
.stApp {
background-color: #2b2b2b;
color: white;
}
</style>
""", unsafe_allow_html=True)
if 'messages' not in st.session_state:
st.session_state.messages = []
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
3. Generating and Fetching Wikipedia Links
Next, I created a function to generate a Google search link based on the userβs input. Then, I scraped the search results to find the actual Wikipedia link and fetched the content from that page.
def generate_link(prompt):
if prompt:
return "https://www.google.com/search?q=" + prompt.replace(" ", "+") + "+wiki"
else:
return None
def generating_wiki_link(link):
res = requests.get(link)
soup = BeautifulSoup(res.text, 'html.parser')
for sp in soup.find_all("div"):
try:
link = sp.find('a').get('href')
if ('en.wikipedia.org' in link):
actua_link = link[7:].split('&')[0]
return scraping_data(actua_link)
break
except:
pass
4. Scraping Wikipedia Content
This is where the content gets extracted from Wikipedia. I used BeautifulSoup to grab all the text from the page, clean it up, and display it at a speed chosen by the user.
def scraping_data(link):
actual_link = link
res = requests.get(actual_link)
soup = BeautifulSoup(res.text, 'html.parser')
corpus = ""
for i in soup.find_all('p'):
corpus += i.text
corpus += '\n'
corpus = corpus.strip()
for i in range(1, 500):
corpus = corpus.replace('[' + str(i) + ']', " ")
speed = st.sidebar.slider("Text Speed", 0.1, 1.0, 0.2, 0.1)
for i in corpus.split():
yield i + " "
time.sleep(speed)
5. Getting a Random Wikipedia Topic
I added a fun feature that lets you fetch a random Wikipedia article. Itβs great for those moments when you just want to learn something new without having to think of a topic.
def get_random_wikipedia_topic():
url = "https://en.wikipedia.org/wiki/Special:Random"
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
return soup.find('h1', {'id': 'firstHeading'}).text
6. Handling User Input and Displaying Content
Finally, I handled the user input and displayed the content in a chat-like interface. I also added options to clear the chat history and summarize the last response.
if st.sidebar.button("Get Random Wikipedia Topic"):
random_topic = get_random_wikipedia_topic()
st.sidebar.write(f"Random Topic: {random_topic}")
prompt = random_topic
if prompt:
link = generate_link(prompt)
st.session_state.messages.append({"role": "user", "content": prompt})
with st.chat_message("user"):
st.markdown(prompt)
with st.chat_message("assistant"):
message_placeholder = st.empty()
full_response = ""
for chunk in generating_wiki_link(link):
full_response += chunk
message_placeholder.markdown(full_response + "β")
message_placeholder.markdown(full_response)
st.session_state.messages.append({"role": "assistant", "content": full_response})
if st.sidebar.button("Clear Chat History"):
st.session_state.messages = []
st.rerun()
if st.sidebar.button("Summarize Last Response"):
if st.session_state.messages and st.session_state.messages[-1]["role"] == "assistant":
last_response = st.session_state.messages[-1]["content"]
summary = " ".join(last_response.split()[:50]) + "..."
st.sidebar.markdown("### Summary")
st.sidebar.write(summary)
Click the link below to start exploring:
https://wiki-verse.streamlit.app/
Check out the code behind Wiki-Fetch on GitHub!
Happy browsing! π
Conclusion
And thatβs it! π Iβve built a simple yet functional Wikipedia search app using Streamlit and BeautifulSoup. This was a fun project to work on, and I hope you find it just as enjoyable to try out. If you have any questions or feedback, feel free to reach out. Happy coding! π