[{"content":"For the last few years, I\u0026rsquo;ve maintained two separate websites. My personal website — built with Hugo Apero — housed my publications, projects, talks, and an about page. My blog — a separate Hugo site with the Archie theme — held all my writing. The idea was to keep \u0026ldquo;professional\u0026rdquo; and \u0026ldquo;personal\u0026rdquo; apart. In practice, it meant I was never sure where something belonged. Is a technical tutorial about R a \u0026ldquo;project\u0026rdquo; or a \u0026ldquo;blog post\u0026rdquo;? Is a conference reflection a \u0026ldquo;talk\u0026rdquo; or a piece of writing? The boundaries were artificial, and maintaining two sites meant twice the headaches.\nSo I merged them.\nWhat Changed Everything now lives at harsh17.in. One site, one theme, one place to find anything I\u0026rsquo;ve written or built.\nThe new site uses PaperMod — a minimal Hugo theme that I\u0026rsquo;ve customized with a warm cream palette, serif typography (Libre Baskerville), and a layout inspired by academic sites like Kieran Healy\u0026rsquo;s. The aesthetic is deliberately quiet. No boxes, no cards, no sidebars. Just text, links, and whitespace. I wanted it to feel like opening a well-organized notebook.\nOh, and the cursor is an autorickshaw. Because why not. 🛺\nWhere Things Are The navigation is simple:\nAbout — who I am, what I do, how to reach me Research — my publications, from the dissertation to conference papers Writing — everything I write: essays, tutorials, travel notes, philosophy, technical notes, coffee musings. All of it, in one stream. You can filter by tags at the top of the page — click \u0026ldquo;philosophy\u0026rdquo; or \u0026ldquo;travel\u0026rdquo; or \u0026ldquo;data-science\u0026rdquo; to narrow things down Talks — conference presentations, guest lectures, workshops CV — PDF of my CV If you\u0026rsquo;re looking for something specific, there\u0026rsquo;s a search page too.\nAll the old URLs still work. If you had bookmarked blog.harsh17.in/meditation/, it redirects to harsh17.in/meditation/. Nothing is lost.\nWhat I Dropped I used to run everything through blogdown in RStudio, which meant every post was an R Markdown file that needed R packages, knitting, and a specific Hugo version pinned in .Rprofile. It broke constantly — missing packages, stale caches, version conflicts. Every few months I\u0026rsquo;d sit down to write and spend an hour fixing the build instead.\nNo more. The new site is plain Markdown. I edit in VS Code (or any text editor), run hugo server in the terminal, and that\u0026rsquo;s it. No R dependency, no blogdown, no .Rmd files. The old R Markdown sources are archived safely, and the rendered Markdown carries forward all the charts and tables they produced.\nI also replaced the embedded Are.na iframes with simple links. Those iframes were loading entire pages worth of content on every visit and eating through my Netlify bandwidth. A link works just as well.\nStaying Updated If you\u0026rsquo;d like to follow along, the best way is through RSS — an open, simple protocol that lets you subscribe to websites without giving away your email or depending on any platform\u0026rsquo;s algorithm.\nHere\u0026rsquo;s the feed link — copy it into your reader of choice:\nhttps://harsh17.in/writing/index.xml New to RSS? It\u0026rsquo;s like a personal newsfeed that you control. You pick the sources, and new posts show up in your reader automatically. No newsletters to manage, no inbox clutter, no tracking pixels. Here are some good readers to get started:\niOS: Feeeed — beautiful, simple, free Mac: NetNewsWire — open source, fast, no account needed Web: Feedly — works everywhere, free tier is generous Email delivery: Blogtrottr — if you prefer getting posts in your inbox, paste the RSS link here and Blogtrottr will email you whenever I publish something new. No account on my end, no subscriber list — they handle it all. I considered setting up a newsletter but every free service either charges for RSS-to-email automation or requires infrastructure I don\u0026rsquo;t want to maintain. RSS is simpler, more private, and more honest. You subscribe when you want, unsubscribe by removing the feed, and I never see your email address.\nThe old personal website and blog are still accessible at hv.netlify.app and hvblog.netlify.app if you\u0026rsquo;re feeling nostalgic. And the even older Google Sites version — my very first website — lives at harsh17.in/old.\nThree generations of the same impulse: put things on the internet and hope someone finds them interesting.\nHere\u0026rsquo;s to the new home. 🏡\n","permalink":"/new-website/","summary":"\u003cp\u003eFor the last few years, I\u0026rsquo;ve maintained two separate websites. My \u003ca href=\"https://hv.netlify.app/\"\u003epersonal website\u003c/a\u003e — built with Hugo Apero — housed my publications, projects, talks, and an about page. My \u003ca href=\"https://hvblog.netlify.app/\"\u003eblog\u003c/a\u003e — a separate Hugo site with the Archie theme — held all my writing. The idea was to keep \u0026ldquo;professional\u0026rdquo; and \u0026ldquo;personal\u0026rdquo; apart. In practice, it meant I was never sure where something belonged. Is a technical tutorial about R a \u0026ldquo;project\u0026rdquo; or a \u0026ldquo;blog post\u0026rdquo;? Is a conference reflection a \u0026ldquo;talk\u0026rdquo; or a piece of writing? The boundaries were artificial, and maintaining two sites meant twice the headaches.\u003c/p\u003e","title":"A New Home on the Internet"},{"content":"Macchi (मक्खी) is Hindi for fly \u0026mdash; the pesky kind that won\u0026rsquo;t leave your food alone. If you\u0026rsquo;ve ever left trash out for too long in an Indian summer, you know the flies will find it before you do.\nThat\u0026rsquo;s the entire premise of this app: if your Mac\u0026rsquo;s Trash has items in it, shouldn\u0026rsquo;t it attract flies?\nWhat It Does Macchi Trash watches your ~/.Trash directory. When there are items in it and you hover your cursor over the Trash icon in your Dock, small animated flies appear and buzz around the icon. Move your cursor away and they disappear. Empty the trash and they\u0026rsquo;re gone for good \u0026mdash; until you throw something away again.\nIt\u0026rsquo;s a completely useless app and I love it.\nHow It Works The technical challenge was more interesting than you might expect. macOS doesn\u0026rsquo;t have a straightforward API for \u0026ldquo;tell me where the Trash icon is on screen.\u0026rdquo; Instead, Macchi Trash uses the Accessibility APIs to find the Dock process, locate the Trash icon within it, and get its screen coordinates. Then it monitors cursor position and overlays transparent windows with animated fly views when the conditions are right.\nThe app:\nWatches ~/.Trash for changes using file system events Uses Accessibility APIs to find the Trash icon\u0026rsquo;s position in the Dock Tracks cursor proximity to the icon When both conditions are met (trash has items + cursor is near), spawns SwiftUI fly animations at the icon\u0026rsquo;s coordinates Since Dock icon positions can shift (different screen sizes, Dock positions, icon arrangements), there\u0026rsquo;s a calibration tool built in. Run it once and the fly positioning stays accurate.\nInstall Download MacchiTrash.zip, extract, move to Applications, and right-click → Open the first time. Grant Accessibility permission when prompted.\nOr build from source:\ngit clone https://github.com/harshvardhaniimi/macchi-trash.git cd macchi-trash ./build_app.sh open MacchiTrash.app Requires macOS 13+ and Accessibility permission.\nNot every project needs a reason. Sometimes you just want flies on your trash.\n","permalink":"/macchi-trash/","summary":"A tiny macOS menu-bar app that shows animated flies buzzing around your Trash when it has items.","title":"Macchi Trash"},{"content":"I read a lot online. Articles, documentation, research threads, blog posts \u0026mdash; there\u0026rsquo;s always something worth saving. For a while, I\u0026rsquo;d been copying and pasting text into my notes, losing all the formatting in the process. Or I\u0026rsquo;d save the whole page as HTML and end up with a bloated file full of ads and navigation bars. What I really wanted was a quick way to grab just the article content, nicely formatted as Markdown.\nAdditionally, there are times when I want to share an article as context to an LLM prompt. Providing the URL is often not enough, especially if the content is behind a paywall, requires a login, or is blocked for AI chatbots. Being able to copy the main content as Markdown allows me to include it directly in my prompts without worrying about access issues.\nSo I built a browser extension that does exactly that.\nWhat It Does Click the toolbar icon and you get three options:\nCopy as Markdown \u0026mdash; extracts the main content of the page and copies it to your clipboard as Markdown Copy as Plain Text \u0026mdash; same extraction, but strips all formatting Download as .md \u0026mdash; saves the page content as a Markdown file The extension uses Mozilla Readability (the same engine behind Firefox\u0026rsquo;s Reader View) to pull out the article content. This means it automatically strips away navigation, sidebars, ads, and other noise. For pages that don\u0026rsquo;t have a clear \u0026ldquo;article\u0026rdquo; structure, it falls back to converting the entire page body.\nOnce it has the clean HTML, Turndown converts it to Markdown with GitHub Flavored Markdown support \u0026mdash; so tables, strikethrough, and task lists all come through correctly.\nBuilding It The extension is built with WXT, a framework for building cross-browser extensions with TypeScript. WXT handles the annoying parts of extension development \u0026mdash; manifest generation, hot reloading in dev mode, and building for multiple browsers from a single codebase.\nThe core logic is surprisingly simple:\nInject a content script into the active tab Clone the page\u0026rsquo;s DOM and run it through Readability Convert the extracted HTML to Markdown with Turndown Copy to clipboard or trigger a download The whole thing is about 400 lines of TypeScript across four modules.\nChrome + Safari The extension builds for both Chrome and Safari. Chrome is straightforward \u0026mdash; load the unpacked build or package it for the Chrome Web Store. Safari requires converting the extension with Apple\u0026rsquo;s tooling and running through Xcode, which is a bit more ceremony but works well.\nYou can find the source code, build instructions, and Chrome Web Store assets on GitHub.\nThis project was orchestrated into existence by Claude Code.\n","permalink":"/webpage-to-md/","summary":"A lightweight browser extension that converts webpages to Markdown, plain text, or downloadable .md files.","title":"Page to Markdown"},{"content":"How do you end an email? If you\u0026rsquo;re like most people, you\u0026rsquo;ve been cycling between \u0026ldquo;Best,\u0026rdquo; \u0026ldquo;Thanks,\u0026rdquo; and \u0026ldquo;Regards\u0026rdquo; for years. Maybe you throw in a \u0026ldquo;Cheers\u0026rdquo; when you\u0026rsquo;re feeling adventurous. I was stuck in this rut too, until I stumbled upon Meg Miller\u0026rsquo;s Are.na channel \u0026mdash; a crowd-sourced collection of over 2,100 creative email sign-offs.\nThe collection is wonderful. People have contributed sign-offs that range from the poetic (\u0026ldquo;With the warmth of a thousand suns,\u0026rdquo;) to the playful (\u0026ldquo;Sent from a carrier pigeon,\u0026rdquo;) to the deeply sincere (\u0026ldquo;Yours in quiet rebellion,\u0026rdquo;). It made me realize that the last line of an email is a small creative act that most of us have completely given up on.\nThe Hindi Problem But here\u0026rsquo;s the thing \u0026mdash; nearly all of these sign-offs are in English. I grew up speaking Hindi and Urdu, languages that are arguably even richer in the vocabulary of warmth, parting, and emotional nuance. The way you say goodbye in Hindi carries layers of meaning that \u0026ldquo;Best regards\u0026rdquo; could never dream of.\nSo I wrote 40+ original Hindi sign-offs. Some are translations of English favorites, but many are originals that draw on the poetic traditions of Hindi, Urdu and Hindustani. (I\u0026rsquo;ve also been collecting some cool Hindi/Hindustani words and phrases here.)\nहज़ार सूरजों की गर्माहट के साथ, With the warmth of a thousand suns,\nशांत विद्रोह में आपका, Yours in quiet rebellion,\nजब तक सितारे एक न हो जाएं, Until the stars align,\nआपकी यादों में खोया हुआ, Lost in your memories,\nThe Website I built a simple website that serves as both a randomizer and a browsable collection. Open it, click \u0026ldquo;Another one,\u0026rdquo; and you get a fresh sign-off in both English and Hindi, side by side. You can copy either version to your clipboard with one click, or download the entire collection as a Markdown file.\nThe tech is deliberately minimal \u0026mdash; it\u0026rsquo;s a single HTML file with no framework, no build step, no dependencies to install. The English sign-offs are pulled live from the Are.na API, and the Hindi translations are embedded in the page. Open the file in a browser and you\u0026rsquo;re done.\nTry It Next time you\u0026rsquo;re about to type \u0026ldquo;Best,\u0026rdquo; stop. Open the sign-off randomizer instead. Your emails will be better for it.\nआपकी ईमेल की शोभा बढ़ाते हुए, Adorning your emails,\n","permalink":"/good-sign-offs/","summary":"2,000+ English email sign-offs from Are.na and 40+ original Hindi sign-offs, in a beautiful randomizer.","title":"Good Sign-Offs / अच्छी विदाइयाँ"},{"content":"THIS IS A WORK-IN-PROGRESS APP. KINDLY EXCUSE THE ROUGH EDGES. FEEL FREE TO TRY IT OUT AND OPEN ISSUES OR CONTRIBUTIONS ON GITHUB. Kalam (कलम) means \u0026ldquo;pen\u0026rdquo; in Hindi and Urdu, but it also carries the sense of speech and words — the things you write with a pen. It felt like the right name for an app that turns your voice into text.\nWebsite: https://harshvardhaniimi.github.io/kalam/\nThe Problem macOS has built-in dictation, but it sends your audio to Apple\u0026rsquo;s servers. There are third-party transcription services too, but they all want a subscription and your data. I wanted something that runs entirely on my Mac — no cloud, no subscriptions, no audio leaving my device.\nOpenAI\u0026rsquo;s Whisper model is one of the best open-source speech recognition systems available, and thanks to WhisperKit by Argmax, it runs natively on Apple Silicon using CoreML and the Neural Engine. That\u0026rsquo;s the foundation Kalam is built on.\nHow It Works Kalam sits in your menu bar as a small waveform icon. The fastest way to use it:\nPress Cmd+Shift+Space from anywhere in macOS Speak Press Cmd+Shift+Space again The transcribed text appears at your cursor and is copied to your clipboard That\u0026rsquo;s it. Writing an email? Click in the body, speak, and the text appears. Taking notes in Obsidian? Same thing. The global hotkey works across all applications.\nYou can also open the menu bar popover for a more visual interface with a waveform display, or use the full window mode for drag-and-drop file transcription.\nModels Kalam supports five Whisper model sizes, from Tiny (75 MB) to Large (2.9 GB). The Base model is the default — it downloads automatically on first launch (142 MB) and can transcribe a minute of audio in about 6 seconds on Apple Silicon. If you need better accuracy, you can switch to a larger model in settings. All models support 50+ languages.\nPrivacy This was the whole point of building Kalam:\nAll processing happens locally on your Mac No internet connection required (except for the initial model download) No telemetry, no analytics, no cloud APIs Audio never leaves your device Transcription history is stored locally Models are downloaded once from Hugging Face and cached in ~/Library/Application Support/Kalam/models/.\nInstall The quickest way:\ncurl -sL https://raw.githubusercontent.com/harshvardhaniimi/kalam/main/install.sh | bash Or clone the repo and run ./build-app.sh to build from source. Requires macOS 14+ (Sonoma).\n","permalink":"/kalam/","summary":"A native macOS menu-bar app for speech-to-text transcription — completely local, private, and free.","title":"Kalam"},{"content":"Our work at HP Inc. on enterprise-scale demand forecasting was featured in Foresight: The International Journal of Applied Forecasting (Issue 79, 2025).\nI had the privilege of presenting this project at the International Institute of Forecasting’s Foresight Practitioner Conference, where HP was named one of five global finalists in the Forecasting in Practice Competition.\nPDF Conference link PDF is shared with permission from the publishers, International Institute of Forecasters.\n","permalink":"/foresight-paper/","summary":"Our HP Inc. forecasting framework was featured in \u003cem\u003eForesight: The International Journal of Applied Forecasting\u003c/em\u003e (Issue 79, 2025) and recognized as a finalist at the International Institute of Forecasting’s Foresight Conference. \u003ca href=\"https://www.harsh17.in/docs/papers/HP_Foresight_Paper.pdf\"\u003e🔗 PDF\u003c/a\u003e","title":"Enterprise-Scale Machine Learning for Demand Forecasting"},{"content":"I was listening to a podcast by OpenAI where they tell us how they trained GPT-4.5, one of the largest models they have today. GPT-4.5 shows intelligence in unexpected ways, demonstrating common sense like other models totally miss. I haven’t used it much myself so cannot comment on that, but at 38:38 in the video, Sam Altman asks Daniel Selsam “why does supervised learning work?”. Without skipping a beat, he replies “compression”. Then he explains that “the ideal intelligence is Solomonoff Induction”. Unfamiliar with this term, I jumped on a conversation with GPT-4.5 and along the way, I learnt several interesting things that I want to share with you all.\nSolomonoff Induction Imagine that you are given the following set of numbers and you have to predict the next one:\n2, 4, 6, 8, ___\nDid you guess 10? Why? Probably because that’s the simplest pattern — “add two each time”. But consider this alternative explanation — “Add two every time until you reach 8, then suddenly switch to adding five”. Then, the next number would be 13.\nBoth explanations match the observed data. Yet, intuitively, you’d bet on 10 because it’s simpler. But why exactly does simplicity feel right?\nSimplicity feels right because of Occam’s Razor: “Among competing hypotheses, choose the simplest one”.\nThen, in 1960s, Ray Solomonoff had a brilliant idea. He decided to represent each possible explanation (or hypothesis) as a computer program. The shorter the program length (in bits or complexity), the simpler the hypothesis.\nFor example, there are several computer programs to decide the next number in our series 2, 4, 6, 8, ___:\nProgram Length (in bits) Interpretation Print all even numbers Short Simple, general Print 2, 4, 6, 8, then print 13, 18, … (add 5 to last number) Medium Complicated, special-case explanation Print numbers 2, 4, 6, 8, followed by random unpredictable numbers Very Long Highly complex, arbitrary, no clear pattern (While Kolmogorov complexity—the length of the shortest program that generates your data—is a powerful theoretical idea, it’s actually uncomputable in practice, so we use description length as an ideal guide for simplicity.)\nThe shortest program often has the simplest interpretation. Supervised learning algorithms are attempts to compress all available information in data useful for decision making into simple binary. Solomonoff said that to predict the next event, we should:\nConsider every possible program that could produce the observed data. Assign probabilities based on simplicity: Shorter program → Higher probability Longer program → Lower probability \\(\\text{Probability} \\propto 2^{-\\text{program length}}\\) Predict the next event by taking a weighted average of the predictions from all programs. In other words, we can imagine an infinite “multiverse” of programs generating your data. We don’t just pick one; we average over them all, weighing each according to simplicity.\nEven though practically it is impossible to use, it guides us to understand that simpler algorithms that compress data better are almost always better representations of data. Additionally, since Solomonoff induction relies on an uncomputable prior, real-world machine learning uses practical substitutes like Minimum Description Length (MDL), Bayesian model averaging, or ensembling to approximate the idea of weighing simpler explanations more heavily.\nMultiverse? Now, while exploring Solomonoff induction, I stumbled upon its intriguing connection with another concept: the multiverse. Physicist Max Tegmark proposed four levels of multiverse, each level capturing a different kind of parallel universe.1\nThe first level Tegmark describes includes regions beyond our observable cosmic horizon. Essentially, these universes are extensions of our own, governed by the same physical laws but possibly differing in their initial conditions.\nThe second level imagines universes arising from something called eternal inflation (recall that because of cosmic inflation our universe is constantly expanding), each potentially having different physical constants and laws of nature. Solomonoff induction would, in theory, incorporate these universes too, evaluating them based on how simple and computable their fundamental rules are.\nShorter, simpler descriptions (or programs) of universes would be considered more probable than complicated, special-case scenarios.\nThe third level relates directly to quantum mechanics, specifically the many-worlds interpretation. Here, every quantum event creates branching universes, each representing different outcomes. This concept resonates strongly with Solomonoff induction’s approach, where each potential future event is considered simultaneously, weighted by simplicity and computability. Each “branch” is akin to a different program output, contributing to the overall prediction.\nFinally, Tegmark’s fourth level—the “ultimate ensemble”—is the broadest and most abstract. It suggests that every mathematically consistent universe that can exist does exist. This level again matches perfectly with Solomonoff induction. Since Solomonoff induction evaluates every conceivable computable hypothesis, it inherently encompasses this idea, assigning probabilities to these universes based purely on the elegance and simplicity of their mathematical description.\nIt’s worth noting that in quantum mechanics, the many-worlds interpretation assigns probabilities to outcomes using the Born rule, not by simplicity—so the analogy with Solomonoff is more metaphorical than literal. Also, eternal inflation refers to the ongoing creation of new “pocket universes” with different constants, while the current expansion of our universe is driven by dark energy.\nOkay, but how is this related to language models? Bringing it back to language models: when models like GPT-4.5 predict text, they’re essentially compressing huge amounts of information into compact representations—similar to the spirit of Solomonoff induction. Training methods like cross-entropy loss connect to Minimum Description Length, rewarding models that capture patterns efficiently.\nWhen sampling outputs, techniques like temperature scaling and nucleus sampling let us explore a range of possible continuations, almost like averaging predictions from multiple “programs” or hypotheses. In practice, ensembling and mixture-of-experts architectures mimic the idea of averaging over many models, echoing the theoretical blend of simplicity and diversity that Solomonoff induction imagines.\nIs Multiverse Real? We don’t know. But as a fan of “Everything Everywhere All at Once”, I would totally say “Yes!”.\nSee “Physics in the multiverse: an introductory review” by Aurélien Barrau for brief introduction on multiverse theory in Physics.↩︎\n","permalink":"/language-models-and-multiverse/","summary":"\u003cp\u003eI was listening to a podcast by OpenAI where they tell us how they trained GPT-4.5, one of the largest models they have today.\nGPT-4.5 shows intelligence in unexpected ways, demonstrating common sense like other models totally miss.\nI haven’t used it much myself so cannot comment on that, but at \u003ca href=\"https://youtu.be/6nJZopACRuQ?t=2318\"\u003e38:38\u003c/a\u003e in the video, Sam Altman asks Daniel Selsam “why does supervised learning work?”.\nWithout skipping a beat, he replies “compression”.\nThen he explains that “the ideal intelligence is Solomonoff Induction”.\nUnfamiliar with this term, I jumped on a conversation with GPT-4.5 and along the way, I learnt several interesting things that I want to share with you all.\u003c/p\u003e","title":"Language Models and Multiverse"},{"content":"Last month, I spent a week in Tirana, Albania. Nestled across the Adriatic from Italy and bordered by Montenegro, Kosovo, North Macedonia, and Greece, Albania has long been one of Europe’s hidden secrets. Until 1991, this small country lived in almost total isolation under a communist dictatorship. After communism collapsed, Albania became a democratic republic. In 2009, it joined NATO and today stands as an EU candidate state—to becoming a full member soon.\nMy friend Dea Bardhosi, whom I met during my internship at HP three years ago, happened to be visiting her home at the time. Before starting my new job, I decided to take a week-long vacation — and exploring a new country with a local makes all the difference. Dea, along with her sister Keti and their parents, made my trip truly special. I’m already looking forward to visiting them again soon.\nHistory of Albania The territory of modern-day Albania was inhabited in classical antiquity by various Illyrian tribes. Between 200 and 400 BCE, Greek colonizers established trading posts along its coastline, most notably in Durrës—a beach town near present-day Tirana. Following the Illyro-Roman Wars (229–168 BCE), the region was absorbed into the Roman Empire. After the empire’s division in 395 CE, Albania came under the Eastern Roman (Byzantine) Empire.\nIn 1444, the Albanian nobleman George Kastrioti—better known as Skanderbeg—united feudal lords in a military alliance to resist the advancing Ottoman Empire. Skanderbeg, now celebrated as a national hero with the main square in Tirana named after him, held off Ottoman forces for over two decades. After his death, and within a decade, Albania fell under Ottoman rule. Yet the sense of national identity he forged left a deep and lasting mark on Albanian nationalism.\nSkanderbeg Square, Tirana, Albania.\nAs the Ottoman Empire weakened, Albania declared its independence on November 28, 1912. The next two and a half decades were tumultuous, marked by the reign of two kings and occupation during both World Wars. These events eventually set the stage for Albania to become a socialist (“Stalinist”) state under Enver Hoxha, the First Secretary of the Labour Party. From 1944 until his death in 1985, Hoxha ruled Albania as a rigid, Stalinist one-party regime defined by forced collectivization, state-enforced atheism (making Albania the world’s first officially atheist country), and extreme international isolation.\nStalin’s death in 1953 and Khrushchev’s speech “On the Cult of Personality and Its Consequences” denouncing the Stalin’s methods in 1956 alarmed Hoxha.1 He viewed de‑Stalinization as a betrayal of Marxism‑Leninism. By 1961, Albania formally split from the USSR, expelled all Soviet advisers, and joined with Chairman Mao’s China as fellow anti‑revisionists. Through the 1960s and early ’70s, Albania received Chinese economic and military aid.\nBut as Mao’s successors began opening China to foreign investment and rapprochement with the West (notably Nixon’s visit in 1972), Hoxha denounced China as “revisionist” as well. By 1978 all Chinese personnel were gone, and Albania embarked on its “self‑reliance” policy. From then until his death in 1985, Hoxha presided over Europe’s North Korea.\nHe persecuted religious institutions, banned almost all private enterprise, and enforced strict travel restrictions. His regime executed or imprisoned thousands deemed opponents, and Albania became one of the most repressive states in Europe. For more on this and other historical stories of Albania, one should read Lea Ypi’s “Free”.\nThe list of “tortures” done during Enver Hoxha’s regime are horrendous — FAIR WARNING. “Strapping dynamite on the body”, “burning of sexual organs with petrol”, “putting inside the coffin alive”, “crushing breasts with pliers”, “cat with claws put inside underpants of women and then hit with a wood”. These are beyond hurtful.\n36 Types of Tortures Committed By The Investigator During The Regime. Courtesy: “House of Leaves” museum.\nUpon Hoxha’s death in 1985, his successor Ramiz Alia cautiously introduced reforms under pressure from domestic unrest and Gorbachev’s Soviet policies.2 By 1991 multi‑party elections had been held, and in March 1992 the center‑right Democratic Party assumed power, marking the end of the communist system. The 1998 constitution enshrined rule of law and human rights. Despite severe economic collapse and the 1997 crisis, Albania stabilized in the early 2000s, joined NATO in 2009, and is now an official candidate for European Union accession.\nHow is Albania Today? Today, Albania is a modern European nation, home to many beautiful attractions and stunning beaches. I was fortunate to be hosted by my friend Dea Bardhosi, who, along with her sister Keti and their parents, generously showed me around the cities of Tirana and Durrës. Here are some highlights from my trip.\nDea Bardhosi and I at the friendship monument in Tirana.\nI landed at Tirana’s Nënë Tereza Airport (Mother Teresa Airport, TIA), which has direct connections to major European cities such as London, Paris, Munich, Frankfurt, and others, as well as flights to and from Dubai. The airport is named after Mother Teresa, the renowned Catholic nun known for her service to India’s poor and sick. Interestingly, although I always assumed she was Indian, she was actually Albanian by birth and became one of the first foreigners to acquire Indian citizenship through naturalization. Despite her iconic status in Albania, Mother Teresa remains somewhat controversial in India, where she has been criticized for her practice of withholding anesthetics from patients, stemming from her belief that suffering brought one closer to God.\nEnver Hoxha Mausoleum “Pyramid” One of Tirana’s most iconic landmarks is the Pyramid of Tirana. Originally opened in 1988 as the Enver Hoxha Museum, it was designed to showcase the legacy of Albania’s communist dictator, Enver Hoxha. Although sardonically referred to as the “Enver Hoxha Mausoleum,” it was likely never intended to house Hoxha’s remains. Today, the Pyramid serves as a popular local attraction and event venue.\nThe Pyramid of Tirana.\nLocal Architecture in Tirana Albania’s Prime Minister since 2013, Edi Rama, is a former art professor and writer whose artistic sensibility is unmistakable throughout Tirana. He’s known for doodling on Microsoft Outlook printouts during ministerial meetings to help him concentrate—a detail that says a lot about his character. After the fall of communism, when Rama became mayor of Tirana, his unconventional priority was to paint the city’s drab buildings in vibrant colors, even as basic infrastructure like street lights were failing. At first, this struck many as an odd choice. But over time, his policy began to work. As Rama recalled in an interview with The Guardian: “It had a chain effect I didn’t imagine. Once the buildings were coloured, people started to get rid of the heavy fences of their shops. In the painted roads, we had 100% tax collection from the people, while tax collection was normally 4%. People accepted to pay their share for the city, because they realised that through the colours the city exists.”\nHis impact is evident everywhere: in Tirana, there’s hardly a building without character—a fact even I, with no artistic training, can’t help but notice.\nTirana’s New Mosque In 2024, the Namazgah (Grand) Mosque of Tirana was inaugurated by Turkish President Recep Tayyip Erdoğan. The mosque was built almost entirely with funding from the Turkish government—about $30 million, provided through (Diyanet). Its location near the Albanian parliament underscores the growing acceptance of Islam in a country that was, until recently, officially atheist. Today, it stands as the largest mosque in Albania, accommodating 8,000 people inside the main dome and an additional 2,000 in the courtyard. The Islamic architecture is strikingly beautiful.\nLocal Food in Tirana I had the pleasure of sharing a traditional lunch with Dea’s family, and the local food was delicious. We enjoyed goat shoulder, a platter of traditional Albanian dishes, and finished with flan accompanied by their classic espresso. The meal was made even more memorable by the setting—at the top of the Dajti Mountains, which you can reach from Tirana by cable car (except on Tuesdays).\nLocal Market Tirana has local markets where one can shop for a variety of fresh fruits and vegetables.\nFroyo: Frozen Yogurt This was probably the first time I had frozen yogurt—and it’s likely one of the best I’ll ever have. Albanians serve their froyo with a variety of fruit jams (strawberry, avocado, blueberry) and toppings like crushed almonds and shredded coconut. It was so good that I had it again in Durrës. I hope I get to try it again someday.\nPetulla: Albanian Fried Donuts Petulla is a simple yet delicious Albanian version of donuts. I definitely plan to try making them myself sometime. They’re typically enjoyed with fruit jam and cheese.3\nBeachtown: Durrës Durrës is the second-most populous city in Albania, after Tirana. It’s actually older than Tirana, having been a significant center during both the Greek and Roman eras. The city is also home to an ancient Roman amphitheater, which—while smaller—bears some resemblance to the Colosseum in Rome.\nThe beach at Durrës is very clean, though quite shallow; the water near the shore barely reached my shoulders, at about a meter deep. The sea is clear and the waves are gentle, making it ideal for swimming. I thoroughly enjoyed my time in the water with Dea.\nLiquor to Try: Raki Albanian raki (raki rrushi, or simply “raki”) is a strong fruit brandy that has been distilled in rural Albania for centuries. The recipe varies by region: northern Albania typically produces grape-based raki (raki rrushi), while in central and southern areas, you’ll also find plum (slivovica), apricot (kajsie), and pear (dardhë) versions.\nI picked up a bottle at the duty-free in TIA airport on my way back and have enjoyed it both as a shot and mixed into a cocktail with lychee, lemon, and sugar.\nConclusion My week in Albania gave me far more than a checklist of sights—it offered a window into a nation reshaping itself after decades of isolation and upheaval. Tirana’s colorful streets, Durrës’ ancient ruins and gentle beaches, the warmth of Albanian hospitality, and the taste of local food and raki all left a vivid impression. Albania is still carving out its place in Europe, balancing old wounds with new energy and optimism.\nIf you’re looking for a travel experience that’s both welcoming and thought-provoking, Albania is well worth a visit. I’m grateful to Dea and her family for making me feel at home in a country still discovering itself—and I look forward to returning someday.\nIn this speech, Nikita Khrushchev denounced Stalin’s methods of creating a cult around his personality and condemned Stalin’s brutal purges, mass arrests, executions, and the creation of a personality cult that placed Stalin above the party and the people. Read the full speech here and more background on Wikipedia.↩︎\nSee, US President Mr. Ronald Reagan’s speech “Mr. Gorbachev, Tear Down This Wall!”.↩︎\nRecipe: https://mediterraneanlatinloveaffair.com/albanian-fried-dough-petulla/↩︎\n","permalink":"/albania/","summary":"\u003cp\u003eLast month, I spent a week in Tirana, Albania.\nNestled across the Adriatic from Italy and bordered by Montenegro, Kosovo, North Macedonia, and Greece, Albania has long been one of Europe’s hidden secrets.\nUntil 1991, this small country lived in almost total isolation under a communist dictatorship.\nAfter communism collapsed, Albania became a democratic republic.\nIn 2009, it joined NATO and today stands as an EU candidate state—to becoming a full member soon.\u003c/p\u003e","title":"A Week in Tirana, Albania"},{"content":" A passport is “an official travel document issued by a government that certifies a person’s identity and nationality for international travel.” The word comes from the medieval French passer (“to pass”) and port (“harbor”): originally a document allowing you to pass through a port town’s gate. The document requests all border protection agencies and governments to give safe passage to the bearer. It validates the identity of the concerned citizen. My Indian passport specifically says:\nThese are to request and require in the name of the President of the Republic of India, all those whom it may concern to allow the bearer to pass freely without let or hindrance and to afford him or her every assistance and protection of which he or she may stand in need.\n— By The Order of President of Republic of India\nToday, passports are one of the most acceptable form of identification worldwide, thanks to International Civil Aviation Organization (ICAO), a United Nations agency, which sets global standards for travel documents. It dictates the qualities of passport like machine-readable zones which enable automated immigration system updates, and biometric e-passports which contain a chip with facial and fingerprint information.1\nWhile a passport looks like a routine booklet, its true power lies in the diplomatic and economic bargains stamped inside its pages.\nPassport Through the Ages Although the concept of modern passport is only a post World War II event — mostly to avoid espionage and identity verification in porous boundaries of the European countries — passports as documents asking for safe passage are very old. One of the oldest instance is found in the Hebrew Bible (Nehemiah 2:7–9, 450 BCE) when Nehemiah, an official serving King Artaxerxes I of Persia, asked permission to travel to Judea; the king granted leave and gave him a letter “to the governors beyond the river” requesting safe passage for him as he traveled through their lands.\nIn ancient India, Arthashashtra (3rd century BCE) describes duty of Mudrādhyakṣa (lit. ‘Superintendent of Seals’) who must issue sealed passes before a person could enter or leave the countryside. Ancient Chinese kingdoms like western Han kingdoms issued passports (傳; zhuan) and it determined a person’s ability to move throughout imperial counties and through points of control.\nHere are some old passport images, courtesy Wikipedia.\nFirst Japanese passport, issued in 1866\nChinese passport from the Qing dynasty, 24th Year of the Guangxu Reign, 1898\nAn Ottoman passport (passavant) issued to a Russian subject dated July 24, 1900\nWorld War II Spanish official passport issued in late 1944 and used during the last six months of the war by an official being sent to Berlin.\nAnother fun fact: With fewer than 800 citizens, Vatican passports are among the rarest—and virtually every holder is either clergy or a Swiss Guard.\nBeautiful Passports While my Indian passport is pretty ordinary (pun intended), there are several countries whose passports are a delight to look at. For example, the Australian passport has hidden images that only show up in UV light, with even the thread acting as Aborginal flag!\nHong Kong’s passport is a marriage of old world artistry and their futuristic skyline.\nNew Zealand’s passport is a beautiful tribute to all the explorers who discovered this beautiful nation, intricately tying Maori designs with modern ones.\nSwitzerland’s passport shows each region’s topography map under UV light.\nUnited States’ passport features the country’s founding principles. Each two-page is marked with a quote from a historically-important-person. It is also the only passport I’ve see which comes with eight pages of usage guidelines.\nThe Turkish passport includes local flowers under UV light and historical buildings under any light.\nSource: The Most Beautiful Passports in the World: Regula’s Pick (2025)\nAllow Bearer to Pass Freely While most passports request the border officials inspecting the document to “let the bearer pass freely”, it isn’t always the case. Not all passports are equal. Some passports, like United Arab Emirates (UAE), Japan and Singapore, are welcomed by almost all countries. Others, like Afghanistan and Pakistan, require a visa by almost all countries. Thus, I wanted to explore how “powerful” each country’s passport is.\nThere are several ways to measure how “powerful” a country’s passport is. Consider India as an example.\nVisa-free: The number of countries that an Indian national doesn’t require a visa to visit. 30 countries allow visa-free access to Indians. Singapore leads this measure with visa-free access to 137 countries. Visa-on-arrival: At the port of entry, many countries grant easy visa to an Indian national. 45 countries give visa-on-arrival to Indians. Australia and Peru lead this measure with visa-on-arrival access to 54 countries each. Land Power: The percentage of earth’s surface that an Indian national can visit. Indians can visit 34.5% of earth’s landmass. Spain leads this measure with access to 88.4% of earth. The infamous Henely Passport Index has slightly different methodology: it uses proprietary information from International Air Transport Association (IATA) Timatic database which also provides essential travel information to international airlines. They don’t count e-Visa towards free travel which another (more user-friendly) tracker Passport Index does. Consequently, there are minor differences in the ultimate results.\nHowever, it is clear that there are some countries that enjoy free travel in more countries than others. So, that begets the question — what countries’ passports are more powerful than others?\nPowerful Passports To answer the question, I calculate a “Travel Easy Index”, which is simply the count of countries that let you in easily (visa-free or visa-on-arrival including e-TA). Data is sourced from the Passport Index website and World Bank for GDP Per Capita (PPP).2 3\nWhen we map out global “Passport Power”, we’re really charting the footprint of a nation’s diplomatic reach, economic heft, and reciprocal trust agreements. At the top of the list you’ll see Japan, Singapore, Germany, each offering their citizens visa-free access to well over 180 destinations.4 These passports earn their strength from decades of stable foreign policy, participation in large trade blocs (like EU/Schengen or ASEAN), and rigorous security standards that give other governments confidence when they stamp that stamp.\nJust behind them, the United States, United Kingdom, Canada and Australia hover in the 150–170 range. Their passports remain formidable largely because of widespread bilateral visa-waiver programs: the U.S. Visa Waiver Program alone opens doors to 40 European and Asian states, while the UK’s Commonwealth connections smooth the way in parts of Africa and the Caribbean. Canada and Australia combine their own economic clout with selective agreements—Australia’s eTA for Canada and vice versa being one example—that boost mutual access without sacrificing border control.\nOn the other end of the spectrum, you’ll notice many developing-economy passports colored pale on our map. These lower rankings reflect a combination of limited diplomatic networks, concerns over overstays or economic migration, and, in some regions, lingering security risks. Small island nations and certain African and South Asian states often negotiate visa waivers last, since major powers prioritize reciprocity with strategic or high-volume partners.\nPassport power isn’t static. In the last five years we’ve seen the United Arab Emirates climb more than 30 places after systematically signing new waiver agreements across Asia, Europe, and Latin America. Likewise, political crises can trigger sudden downgrades—as travel bans and suspensions ripple through reciprocal programs. For instance, Russia’s passport lost easy travel access to 15-20 countries since 2021 since the Ukraine war.\nWelcoming Countries “Vasudhaiva Kutumbakam” is a Sanskrit phrase meaning “the whole world is our home”. India, during its G20 presidency in 2022-23, used this as the motto for the G20 events. However, it is obvious that some families are more welcoming than others. If you have ever have applied for a United States visa from a developing country, you’d probably know what I’m talking about. When my parents applied for their U.S. tourist visa, it took them 1.5 years just to get an interview appointment at the consulate.\nSo, what countries are most welcoming? Asian countries are leading — Vietnam, Thailand and more.\nWhen we render the Welcoming Score choropleth, one fact jumps out: it isn’t the world’s richest or most powerful states that lead on hospitality, but a cluster of sub-Saharan African nations. In 2025, nine countries admit every one of the 198 foreign passports visa-free or with visa-on-arrival! That list reads like a tour of West and Central Africa’s open-door pioneers: Togo, Equatorial Guinea, Côte d’Ivoire, Guinea, Comoros, Djibouti, Guinea-Bissau, Burundi and Nigeria. By eliminating practically all entry barriers, these governments have signaled that attracting visitors—whether tourists, investors or members of their diaspora—is a strategic priority.\nJust below that “198-club” sits a second tier of African and a handful of island nations. Countries like Rwanda, Mozambique and Seychelles (each scoring 197–196) plus the Maldives and Samoa have likewise streamlined e-visa platforms or visa-on-arrival schemes to broaden access.\nContrast this with many Western and East Asian states, where high GDP per capita (PPP) and strong passports for their citizens don’t always translate into openness for inbound travelers. Europe’s Schengen Area and North America maintain tighter controls—often justified by security or immigration-management concerns—so their Welcoming Scores rarely breach the top 20.\nIn short, our map paints a clear picture: the most welcoming places in the world aren’t necessarily the wealthiest, but those that have made a conscious policy choice to throw their doors wide open. Africa and East Asia leads by example, proving that visa-free access can be a powerful catalyst for growth, goodwill and global connection.\nDoes Wealth Predict Passport Power? To test whether richer countries really do have stronger passports, I ran two simple regressions of our Travel Easy and Welcoming scores on GDP per capita.\nYou can draw two clear, contrasting lessons from these regressions.\nFirst, GDP per capita is a strong predictor of your Travel Easy Index: the coefficient of 0.00111 means that for every extra $1,000 in GDP per capita, a country can expect roughly one more visa-free or visa-on-arrival destination. The relationship is highly significant (\\(p\u0026lt;2 \\times 10^{-16}\\)) and explains about 52 percent of the variation in passport power (\\(R^2≈0.52\\)). In plain English: wealthier countries tend to negotiate more visa waivers, but GDP only tells half the story—other factors like diplomatic ties, security partnerships and regional blocs fill in the rest.\nBy contrast, GDP per capita barely explains Welcoming Score. The slope is slightly negative (–0.000393), suggesting richer nations admit marginally fewer visitors without advance visas, but the effect is very small and only accounts for about 4 percent of the variation (\\(R^2≈0.04\\)). In other words, economic wealth doesn’t much predict how open a country is to incoming travelers. Some lower-income countries have adopted extremely liberal entry policies, and many wealthy ones maintain tighter controls.\nModel Term Estimate Std. Error p-value Travel Easy (Intercept) 81.5109 3.1924 0.000 Travel Easy GDP_Per_Capita 0.0011 0.0001 0.000 Welcoming Score (Intercept) 126.2201 5.6594 0.000 Welcoming Score GDP_Per_Capita -0.0004 0.0001 0.006 R² Model 0.52 Travel Easy 0.04 Welcoming Score Conclusion In a world where a simple booklet can open—or close—borders, passports embody far more than identity: they trace the arc of diplomacy, economic strategy and mutual trust. From the ornate UV-inks of New Zealand’s pages to the visa-free corridors enjoyed by Emiratis, we see how history and policy converge on paper. At the same time, the surprising generosity of many African and Asian states reminds us that openness is often a deliberate choice, not merely a by-product of wealth. As you plan your next trip, remember that behind every stamp lies a story of negotiation, goodwill and the promise of shared humanity. I definitely will.\nMalaysia rolled out the world’s first biometric “e-passport” in 1998, ahead of ICAO’s 2004 mandate for chip-based travel documents.↩︎\nComplete data is available on my Github at https://github.com/harshvardhaniimi/blog/blob/main/content/posts/2025-06-24-passports/passport_plot_data.csv↩︎\nEverywhere in this post, I’ve used GDP per capita with purchasing power parity (PPP), sourced from the World Bank for the year 2023 in current international dollars. See https://data.worldbank.org/indicator/NY.GDP.PCAP.PP.CD↩︎\nCuriously, those who lost the World War 2 gained a lot subsequently.↩︎\n","permalink":"/passports/","summary":"\u003cscript src=\"/passports/index_files/htmlwidgets/htmlwidgets.js\"\u003e\u003c/script\u003e\n\u003cscript src=\"/passports/index_files/pymjs/pym.v1.js\"\u003e\u003c/script\u003e\n\u003cscript src=\"/passports/index_files/widgetframe-binding/widgetframe.js\"\u003e\u003c/script\u003e\n\n\n\u003cp\u003e\u003cimg src=\"images/all-passports.png\" /\u003e\u003c/p\u003e\n\u003cp\u003eA passport is “an official travel document issued by a government that certifies a person’s identity and nationality for international travel.” The word comes from the medieval French \u003cem\u003epasser\u003c/em\u003e (“to pass”) and \u003cem\u003eport\u003c/em\u003e (“harbor”): originally a document allowing you to pass through a port town’s gate.\nThe document requests all border protection agencies and governments to give safe passage to the bearer.\nIt validates the identity of the concerned citizen.\nMy Indian passport specifically says:\u003c/p\u003e","title":"Power of Your Passport"},{"content":"My three-year-long collaboration with HP Inc.—including an 18-month internship with their Strategic Planning and Modeling team—has culminated in a new publication in the INFORMS Journal of Applied Analytics. This paper is also the primary research output from my dissertation, co-authored with the HP team and my advisor, Dr. Charles Liu. I’m especially grateful to Cara Curtland at HP, whose partnership was instrumental in executing this work.\nAbstract HP Inc. manufactures and sells more than 18,000 print-related products in over 170 countries. Accurate forecasting of the heterogeneous and dynamic demand is vital to support supply planning decisions for manufacturing, inventory management, shipment scheduling, and ultimately, customer satisfaction. Forecasting higher or lower than actual demand results in excess or shortage that reduces profitability and impacts on-time delivery to customers.\nHistorically, the supply planning depended on (1) consensus demand forecasting approach, which requires manual collection and integration of information by the forecasting experts, and (2) statistical time-series forecasting models. The consensus forecasting approach also requires frequent corrections if some uncertainties in the demand are not accounted for when releasing the forecasting results. Traditional time-series models can work automatically without frequent correction, but their forecasting performance is unsatisfactory because of oversimplified modeling inputs and assumptions.\nIn this project, we document the process of using machine learning (ML) techniques across all print products at HP Inc., worldwide. Our aim is to automate the forecasting process with high accuracy and to integrate those results into a human-in-the-loop process that merges the strengths of ML, statistical, and consensus forecasting. Our tree-based (LightGBM) forecasting model reduced systematic errors in comparison with existing approaches, such as the consensus and statistical forecasting approaches, and was deployed as an integrated part of HP Inc.’s forecasting process.\nFurthermore, our ML framework establishes strong foundation for further methodological improvements in the ML algorithm. We report extensive empirical evidence guiding our methodology design and demonstrating the business implications of our project. We also share several important principles we have applied to manage team-based collaboration for an enterprise-scale project and to ensure the success of our ML-based demand forecasting.\nLinks DOI: https://doi.org/10.1287/inte.2024.0126\nPDF: Finally-accepted Preprint\n","permalink":"/print-demand-forecasting-with-machine-learning-at-hp-inc/","summary":"HP Inc. replaced manual and statistical forecasting with a machine learning (LightGBM) model to improve demand prediction accuracy across 18,000+ print products. The model has been deployed enterprise-wide, with demonstrated business value and principles for scaling ML in large organizations. \u003ca href=\"https://www.harsh17.in/docs/papers/HP_Paper_IJAA_Preprint.pdf\"\u003e🔗 PDF\u003c/a\u003e","title":"Print Demand Forecasting with Machine Learning at HP Inc."},{"content":"This is my Ph.D. dissertation. I earned my Ph.D. in Business Analytics from the Haslam College of Business, University of Tennessee, USA, in May 2025. My primary advisor was Dr. Chuanren (Charles) Liu.\nFull dissertation (PDF) Official version (University of Tennessee TRACE) Blog post Abstract Accurate demand forecasting is critical for operational efficiency and strategic decision-making in large-scale enterprises. This dissertation presents a machine learning (ML)-driven demand forecasting framework implemented at a Fortune-500 company HP Inc., focusing on three key areas: ML-based predictive modeling, MLOps and deployment scalability, and Human-in-the-loop forecasting integration. Additionally, we explore how predictive optimization enhances decision-making through end-to-end learning.\nThe first contribution involves the development of a scalable ML-based forecasting system, leveraging tree-based models (LightGBM), feature engineering, and advanced time-series methodologies. The model captures complex demand drivers, including macroeconomic trends, product life cycle effects, and channel inventory dynamics. By transitioning from traditional statistical models to ML-based approaches, the framework improves forecasting accuracy in key metrics while adapting to evolving market conditions.\nThe second contribution addresses MLOps and enterprise-scale deployment challenges, ensuring model reliability, automation, and reproducibility. The research outlines best practices in model monitoring, version control, model deployment, and continuous learning pipelines, demonstrating how systematic ML deployment reduces technical debt and maintains forecast accuracy over time.\nThe third contribution integrates \u0026lsquo;Human-in-the-Loop\u0026rsquo; forecasting, ensuring that ML predictions are refined through expert-driven consensus mechanisms. The system incorporates business intelligence inputs such as sales insights, promotional strategies, and market conditions, balancing data-driven automation with human expertise to enhance interpretability and trust in forecasts. Through this closed-loop process, we are able to improve the overall forecast accuracy by 34% (wMAPE) and reduce inventory by 28% while maintaining same service levels.\nFinally, this dissertation presents a predictive optimization framework that transforms ML-based predictions into actionable strategies. We showcase how perfect predictions still don\u0026rsquo;t lead to perfect decisions through a simulation study. Subsequently, we propose an end-to-end learning paradigm that simultaneously addresses demand forecasting, inventory allocation, procurement planning, and production scheduling in the supply chain.\n","permalink":"/dissertation/","summary":"My Ph.D. dissertation (University of Tennessee, 2025) develops a machine-learning-driven demand forecasting framework implemented at HP Inc., improving forecast accuracy by 34% and reducing inventory by 28%. \u003ca href=\"https://www.harsh17.in/docs/2025_04_10_Doctoral_Dissertation.pdf\"\u003e🔗 PDF\u003c/a\u003e","title":"From Data to Decisions: Enterprise Demand Forecasting with Machine Learning"},{"content":"On March 31, 2025, I successfully defended my dissertation: “From Data to Decisions: Enterprise Demand Forecasting with Machine Learning.” My work is rooted in generalizable research at the intersection of machine learning, operations research, and organizational decision-making, grounded through a real-world implementation at HP Inc.\nFinal accepted draft of my dissertation is available here.\nWhat is my dissertation about? Demand forecasting has a rich intellectual and practical history. Ancient texts like the Indian Arthashastra (350 BCE) and Chinese Han Dynasty archives both emphasized blending qualitative judgment with quantitative grain records to estimate demand. Fast forward to the industrial age, companies like Ford and Chrysler pioneered judgmental forecasting to support assembly lines. In the 1960s, statisticians like Box, Jenkins, Holt, and Winters developed foundational time-series methods like ARIMA and exponential smoothing, which still serve as industry baselines.\nYet these classical models have inherent limitations: they treat each time series in isolation, rely solely on historical demand, and often assume simplistic linear patterns. As demand environments became more volatile and complex—spanning thousands of SKUs across global markets—the need arose for scalable, data-driven approaches.\nMachine learning (ML) entered as a powerful alternative. ML models can incorporate high-dimensional data, learn nonlinear patterns, and integrate external signals like marketing promotions or macroeconomic conditions. Most importantly, they enable a shift from siloed, local models to global forecasting systems that learn across products and markets.\nFrom Concept to Practice: Generalizing the Framework My dissertation develops and validates a comprehensive ML-based forecasting framework, centered on scalability, interpretability, and organizational integration. Theoretically, it proposes a hybrid forecasting system combining tree-based models (LightGBM), rich feature engineering, and human-in-the-loop design. Empirically, it demonstrates how this architecture improves forecast accuracy and operational responsiveness in enterprise settings.\nRather than treating the HP implementation as the core, it serves as a rigorous testbed for generalizable insights. For example, the use of LightGBM wasn’t simply because it worked well at HP—it was chosen based on comparative performance in the M5 forecasting competition and its effectiveness on tabular, structured enterprise data. The use of global modeling across SKUs and countries reflects broader trends in ML demand forecasting research and is designed to be transferable to other firms.\nCase Application at HP Inc. HP sells over 18,000 print SKUs in 170+ countries. Forecasting demand across this matrix is essential for production scheduling, R\u0026amp;D budgeting, and supply chain planning. Historically, HP used two methods: statistical models like ARIMA for baseline estimates, and consensus forecasts refined by domain experts. In 2019, the Strategic Planning and Modelling (SPaM) team initiated a transition toward ML-based forecasting, which I helped design and implement during a 16-month internship.\nWe built an end-to-end pipeline using LightGBM models trained on historical sales, inventory levels, product life cycle indicators, and regional factors. The system was built to generalize across product hierarchies and geographies, while remaining interpretable to planners. Forecasts were served through interactive dashboards, enabling human planners to review, adjust, or override them based on local knowledge—thus creating a robust human-in-the-loop architecture.\nMLOps for Forecasting at Scale A key research contribution lies in operationalizing ML systems. Our MLOps framework emphasized reproducibility, automation, and governance. We used parameterized notebooks, MLFlow tracking, and structured version control to ensure that model training, evaluation, and deployment were seamless and auditable. Model retraining pipelines ensured continuous learning from new data while monitoring systems flagged drifts in prediction quality.\nThis MLOps architecture isn’t unique to HP; rather, it offers a replicable playbook for any enterprise seeking to productionize ML-based forecasting.\nHuman + Machine: A Symbiotic System One of the most important lessons from this work is that automation does not eliminate the need for human expertise. On the contrary, machine learning is most effective when paired with human intuition and domain knowledge. The implemented forecasting system allows planners to adjust forecasts using soft signals—like expected promotions or supply chain disruptions—and provided transparent explanations to encourage trust.\nOver a three year period, this hybrid approach improved forecast accuracy by 34% and reduced inventory holdings by 28%, all while maintaining service levels. These numbers are impressive, but more importantly, they validate the core research insight: ML and human forecasting are not mutually exclusive—they are complementary.\nTo Conclude My research bridges the gap between algorithm design and real-world decision-making. While the implementation at HP Inc. serves as a case study, the core ideas—scalable ML systems, human-in-the-loop forecasting, and robust MLOps infrastructure—are meant to inform a broader audience of researchers and practitioners. The lessons extend beyond any one company, pointing toward a future where intelligent systems amplify human decision-making at scale.\nRelated Publications This research has been accepted for publication in the INFORMS Journal of Applied Analytics, where we present a detailed account of the modeling framework and enterprise deployment. A preprint is available here.\nWe also presented this work at the Foresight Practitioners Conference 2025, where it was recognized as a Top-5 finalist globally for the IIF Forecasting Practice Competition. An executive summary tailored for industry audiences is forthcoming in Foresight: The International Journal of Applied Forecasting.\nBehind the Research: People Who Made It Possible This dissertation—and everything it represents—would not have been possible without the kindness, intellect, and support of people across continents and stages of life.\nFirst and foremost, my advisor, Dr. Chuanren (Charles) Liu, has been the intellectual anchor of this journey. His rare combination of sharp research instincts, generous time, and thoughtful critique shaped not only this dissertation but my growth as an independent researcher. He gave me remarkable freedom in defining my research path—a freedom I now recognize, in hindsight, as a true gift. I once discussed with Prof. Bobby how some advisors assign their own topics, making things easier in the short term but stifling long-term independence. That conversation only deepened my appreciation for Dr. Liu’s mentorship. I owe him more than I can ever express.\nMy internship and continued collaboration with HP have been central to this work. It wasn’t just a corporate stint—it became a living laboratory for my ideas. I’m grateful to Cara Curtland, Adam Ghozeil, and Jerry Hwang for their generous support in translating research into real-world impact. What made it special was HP’s willingness to treat this not as a mere “intern project,” but as a serious global initiative worthy of scale and rigor. That trust means a lot.\nDr. Sean Willems’ early mentorship gave me the mental scaffolding I still build on. His research remains a North Star, especially when I’ve felt adrift. His advice—often tucked inside thoughtful, incisive emails—has shaped my thinking in more ways than I can count. I’ve saved many of those emails, and I revisit them often.\nI’ve been fortunate to build meaningful relationships with many professors at UT. Dr. Michael Galbreth, our department head, has a rare “yes-first” leadership style. His answer was always, “Let’s see how,” even when we asked for something as rare as a grad student pay raise. Dr. Wenjun Zhou has been a consistent source of encouragement—I hope we can collaborate on research in the future. Dr. Robert Mee offered guidance at critical junctures. Dr. Emre Demirkaya made “Statistical Learning” one of the most difficult—and rewarding—courses of my PhD, and also became a great friend. Terry Higgins and Charlie Cweik helped shape me into a better teacher, for which I’m truly thankful.\nThis path into research wouldn’t have begun without Dr. Pritam Ranjan, who nudged me early on at IIM Indore through summer internships and research papers. His belief in me during those formative years set everything else in motion.\nLife in Knoxville would’ve felt a lot lonelier without the people who brought warmth and light. Nikhil Narayane has been a constant companion through shared road trips and memories. Yu Jiang has been a close and trusted friend, introducing me to Chinese culture (and hot pot!)—and I look forward to visiting China with her one day. Greeshma Geetha has a wonderful way of bursting my thought bubbles and grounding me in nature and environmental consciousness. Thanks to her, I notice the world more.\nPablo and I have had countless late-night debates on technology, programming languages, and the very fabric of the internet. To paraphrase Godwin’s Law—if most online conversations eventually devolve into references to Hitler, ours always loop back to the philosophy of tech. Samudra Dasgupta (and his dog Henley) brought color and chaos in the best way. Despite his quirks, Samudra is a kindred spirit, driven by the same idealistic goal—to make the world better than we found it.\nA surprise gift of my time at HP was meeting Dea Bardhoshi. Her curiosity and presence have sparked many thoughtful conversations. Our weekly serendipity calls have helped keep my own curiosity alive.\nAnd then there’s Meenal. My favorite person. Food for both my mind and heart. Her presence is an anchor, a mirror, and a source of joy. I find myself yearning for every next meeting with her.\nFinally, to my parents—Rajendra Prasad and Chandra Lata Barnwal—my brother Shashank, and my sister Shalini: your unwavering love and patience have sustained me in more ways than you know. Everything I do is built upon your sacrifices and support. I carry you with me, always.\nPhD in Pictures Assorted pictures from the last four years.\n","permalink":"/phd/","summary":"\u003cp\u003eOn March 31, 2025, I successfully defended my dissertation: “From Data to Decisions: Enterprise Demand Forecasting with Machine Learning.” My work is rooted in generalizable research at the intersection of machine learning, operations research, and organizational decision-making, grounded through a real-world implementation at HP Inc.\u003c/p\u003e\n\u003cp\u003eFinal accepted draft of my dissertation is available \u003ca href=\"https://www.harsh17.in/docs/2025_04_10_Doctoral_Dissertation.pdf\"\u003ehere\u003c/a\u003e.\u003c/p\u003e\n\u003cdiv id=\"what-is-my-dissertation-about\" class=\"section level1\"\u003e\n\u003ch1\u003eWhat is my dissertation about?\u003c/h1\u003e\n\u003cp\u003eDemand forecasting has a rich intellectual and practical history.\nAncient texts like the Indian \u003cem\u003eArthashastra\u003c/em\u003e (350 BCE) and Chinese Han Dynasty archives both emphasized blending qualitative judgment with quantitative grain records to estimate demand.\nFast forward to the industrial age, companies like Ford and Chrysler pioneered judgmental forecasting to support assembly lines.\nIn the 1960s, statisticians like Box, Jenkins, Holt, and Winters developed foundational time-series methods like ARIMA and exponential smoothing, which still serve as industry baselines.\u003c/p\u003e","title":"From Data to Decisions: The Story Behind My PhD"},{"content":"I’ve used a lot of AI in the last few years. I’ve also written a lot about it previously. Here are some of my previous posts around AI.\nImprovements in Artificial Intelligence, December 9, 2021 I wonder how this AI thing is going to shape up, March 3, 2023 How does GPT work? Understanding Generative AI Models, April 26, 2023 Four AI Chatbots other than ChatGPT, November 27, 2023 OpenAI’s GPT is a terrific idea, February 8, 2024 When ChatGPT (GPT-3.5) took the world by storm, I was actually offline in a Vipassana course. I only got to know about it after a while but it fascinated me that it could write a poem, albeit poorly. Now, I couldn’t make the joke “oh, it can’t write a poem anyway”. Creative jobs were up for grabs now.\nHowever, the more I think of it, I find AI to be complementing my intelligence rather than replacing it. My own curiosities that required a lot of effort to answer, now can be answered quickly and reliably (thanks Perplexity!). Obscure technical questions around topics that no one except me cares about (like learning Kaithee script and Hindustani language) are so much better with Claude and ChatGPT. Often I find myself checking for updated information on Reddit Answers or Grok’s tweet search. Image generation by Grok is too good. Coding (especially debugging) is so much better today that without these LLMs. I have created projects where I got extensive help from LLMs like Spotify Randomizer and jailbreaking my Kindle.\nSome of the newer writings on this website may have been “proofread” with the help of an AI. So far, I haven’t found any AI that’s able to write with my level of English or Hindustani. But, it’s amazing for correcting spelling mistakes and reorganizing text for better flow. I use it for that. Which tool specifically, varies quite a bit over time and space.\nBut like all authors in their books say, “all mistakes are mine alone”. If I say something to you because I had a faulty source, my explanation that it was faulty source isn’t enough. Thus, if I present it to you, it has my word of confirmation and verification. I might still make an honest mistake and in that case, PLEASE correct me.\nI think it goes back to what Douglas Adams wrote in 1999 about the internet:\nSo people complain that there’s a lot of rubbish online,… or that you can’t necessarily trust what you read on the web. Imagine trying to apply any of those criticisms to what you hear on the telephone. Of course you can’t ‘trust’ what people tell you on the web anymore than you can ‘trust’ what people tell you on megaphones, postcards or in restaurants. Working out the social politics of who you can trust and why is, quite literally, what a very large part of our brain has evolved to do. For some batty reason we turn off this natural scepticism when we see things in any medium which require a lot of work or resources to work in, or in which we can’t easily answer back – like newspapers, television or granite. Hence ‘carved in stone.’ What should concern us is not that we can’t take what we read on the internet on trust – of course you can’t, it’s just people talking – but that we ever got into the dangerous habit of believing what we read in the newspapers or saw on the TV – a mistake that no one who has met an actual journalist would ever make. One of the most important things you learn from the internet is that there is no ‘them’ out there. It’s just an awful lot of ‘us’.\nDouglas Adams’ prescient observations about internet skepticism parallel our current relationship with AI in striking ways. Just as he noted that we shouldn’t inherently trust internet content simply because it exists in a medium that requires significant resources to operate, we shouldn’t automatically trust (or distrust) AI outputs merely because they come from sophisticated models trained on vast datasets. Just as Adams pointed out there is no “them” on the internet, just lots of “us,” we should remember that AI systems are ultimately tools created by and for humans, reflecting our collective knowledge, biases, and limitations.\nWhat you can trust or not trust will be the job of institutions, like Yuval Noah Harrari says in the interview with Kara Swisher. Thus, trust the institutions. To that effect, I have also created a Chrome extension that you can select any text and then ask Perplexity or Grok to factcheck it for you (for free). By making fact-checking more accessible, we’re not just helping individuals make better decisions; we’re contributing to a broader ecosystem of trust and verification in the AI age.\nAnother side note on AI text: These language models were trained on all1 human text and many people consider this an evil act by itself. I’m not so sure of my position. In one way, everything that humans have done so far was to advance the humanity even further. No one gave Aristotle or Kalidasa money for translating or copying their works in many languages. Today, the problem has arisen primarily due to the speed at which this transmission has happened. Writings of yesterday might end up in training dataset today. But philosophically, nothing has changed.\nAncient authors often didn’t seek personal attribution or compensation. Rather, they were supported by patronage from the royal courts — like media agencies pay journalists today. Like Patanjali’s Yoga Sutras advanced our knowledge of medicine and surgery without any compensation to them, I am certain these systems being created will result in even better creative works, which will fuel next systems supercharging creativity.\nThus, our works should be considered an homage to the future civilizations rather than mere intellectual property to be guarded.\nI know it is easy for me to be philosophical; my bread and butter isn’t writing. Thus, I expect a lot of people, especially independent journalists and writers, to completely disagree with me.\nBut, this is my AI manifesto for this website. For this website (and for this website alone), I give OpenAI, Anthropic, Google, DeepSeek and everyone else, who wants to use my work to create something new, express permission to use them however they see fit. I believe their creation will help the future humanity and thus I will have fulfilled my duty to make the world a better place.\nThis was inspired from Damola Morenikeji’s AI Manifesto. I’ve also requested this to be added to the list of AI manifesto.\nNot all literally, but figuratively. We are running out of scaling training data and now data curation — selecting right set of data to fine tune the model — is more rewarding that just dumping more data.↩︎\n","permalink":"/ai-manifesto/","summary":"\u003cp\u003eI’ve used a lot of AI in the last few years.\nI’ve also written a lot about it previously.\nHere are some of my previous posts around AI.\u003c/p\u003e\n\u003col style=\"list-style-type: decimal\"\u003e\n\u003cli\u003e\u003ca href=\"https://blog.harsh17.in/ai-improvements/\"\u003eImprovements in Artificial\nIntelligence\u003c/a\u003e, December 9,\n2021\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://blog.harsh17.in/ai2/\"\u003eI wonder how this AI thing is going to shape\nup\u003c/a\u003e, March 3, 2023\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://blog.harsh17.in/gpt/\"\u003eHow does GPT work? Understanding Generative AI\nModels\u003c/a\u003e, April 26, 2023\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://blog.harsh17.in/four-ai-chatbots-other-than-chatgpt/\"\u003eFour AI Chatbots other than\nChatGPT\u003c/a\u003e,\nNovember 27, 2023\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://blog.harsh17.in/openai-gpts/\"\u003eOpenAI’s GPT is a terrific\nidea\u003c/a\u003e, February 8, 2024\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eWhen ChatGPT (GPT-3.5) took the world by storm, I was actually offline in a Vipassana course.\nI only got to know about it after a while but it fascinated me that it could write a poem, albeit poorly.\nNow, I couldn’t make the joke “oh, it can’t write a poem anyway”. Creative jobs were up for grabs now.\u003c/p\u003e","title":"AI Manifesto"},{"content":"We read a lot of things online that make us go \u0026ldquo;wait, is that actually true?\u0026rdquo; or \u0026ldquo;what does that even mean?\u0026rdquo; Usually the next step is to open a new tab, paste the text into ChatGPT or Perplexity, and ask. TextSage cuts out those steps.\nWhat It Does Select any text on a webpage, right-click, and choose one of four options:\nExplain with ChatGPT \u0026mdash; opens ChatGPT with a prompt asking it to explain the selected text Explain with Claude \u0026mdash; same thing, but with Claude Fact Check with Perplexity \u0026mdash; sends the text to Perplexity with a fact-checking prompt Fact Check with Grok \u0026mdash; sends it to Grok instead That\u0026rsquo;s it. No setup, no API keys, no accounts to create (beyond whatever AI service you already use). The extension just builds a smart URL with your selected text and opens it in a new tab.\nPrivacy First TextSage collects zero data. No analytics, no tracking, no telemetry. The selected text goes directly from your browser to the AI platform you chose \u0026mdash; it never passes through any intermediary server. The extension is open source, so you can verify this yourself.\nHow It\u0026rsquo;s Built The extension is remarkably simple \u0026mdash; it\u0026rsquo;s built on Chrome\u0026rsquo;s Manifest V3 and uses the Context Menu API. The core logic is about 50 lines of JavaScript:\nRegister context menu items when the extension loads When clicked, grab the selected text Encode it into a URL with a pre-crafted prompt Open a new tab to the chosen AI platform There\u0026rsquo;s no background processing, no content scripts injected into pages, no storage. It\u0026rsquo;s essentially a smart URL generator that saves you the copy-paste-prompt cycle.\nInstall TextSage is available on the Chrome Web Store \u0026mdash; install it and it\u0026rsquo;s ready to use immediately. You can also check out the website for more details.\nThe source code is on GitHub.\n","permalink":"/textsage/","summary":"A Chrome extension that lets you explain or fact-check any selected text with a right-click.","title":"TextSage"},{"content":"Some friends of mine asked me how to make homestyle Indian curries. We Indians are good at it; we make many types of Sabzi सब्ज़ी that non-Indians simply call “curry”. Sabzi literally means vegetables. Today, I’m excited to share a versatile curry recipe that’s perfect for anyone looking to explore Indian cooking.\nThe recipe is a template or a base recipe for homemade Sabzi. You can use it with any vegetable of your preference or even Chicken (but cooking Chicken will take longer, so remember to adjust for that).\nAlso know that this is a home cooked curry, not something you would find in an Indian restaurants. Thus, please don’t compare it with what you see in Indian menu. Parts of it I learnt from watching my mother cook, and rest from six years of cooking.\nThe Heart of Good Sabzi The beauty of Indian curry lies in its foundation — an aromatic base for vegetables to be fried into, and then turned into the curry. The vegetable (Sabzi) could be cauliflower (Gobi) resulting in Gobi Sabzi; eggplant/brinjal (Baigan Sabzi); mushrooms (Mushroom Sabzi); okra/ladyfingers (Bhindi Sabzi); or even a mixture of them (Mix Sabzi or Mix Veg).\nLet’s start with what you’ll need. Don’t worry if the ingredient list seems long – most of these items can be found in your local grocery store, and the rest are worth a special trip to an Indian market. Trust me, your taste buds will thank you!\nYour Shopping List For the aromatic base, gather:\n2-3 tablespoons of your preferred cooking oil1 2 teaspoons each of cumin and mustard seeds One large onion, finely chopped 4-5 cloves of garlic and half-thumb size of ginger, chopped (or a tablespoon of ginger garlic paste) Two medium tomatoes Two medium potatoes Fresh green chilies (adjust based on your spice tolerance, 1 for low, 3-5 for high) Fresh cilantro for that final touch Salt to taste Masala (Spices)\nTurmeric (Haldi): Gives curry its signature golden color and has anti-inflammatory properties Garam Masala: A warming blend of ground spices, essential for most Indian dishes Red Chili Powder: Adds heat and color (start with less if you’re sensitive to spice) Additionally, you should get some spice mix. Traditional homes don’t use them but they’re necessary unless you know what proportion of ten different spices to use every single time. MDH or Everest are good brands for spice mix (found in Indian stores).\nMDH/Everest Sabzi Masala: Perfect for vegetable dishes Kitchen King by MDH: A versatile blend that enhances any curry’s flavor Don’t feel restricted by the labels—you can use Biryani masala in curry; off-label usage is okay. You can mix different masalas to create your own unique flavor profiles.\nRecommended Combination\nFor basic curry: Turmeric + Garam Masala +Sabzi Masala For extra flavor: Add Kitchen King to the basic combination For heat lovers: Add extra red chili powder any combination Remember: Start with small amounts of masala and add more to taste - you can always add, but you can’t subtract!\nChoose Your Main Vegetable(s) You can use zero, one or multiple vegetables. Each vegetable has its own taste. Using no vegetable would be Tamatar-Aloo Sabzi or (Tomato-Potato Curry).\nCauliflower: 1 medium head, cut into florets Eggplant: 2 medium, cubed Mixed vegetables: 2-3 cups of your choice Mushrooms: 8-10 oz, quartered Bell peppers: 1-2, chopped Instructions Base Curry Preparation Heat oil in a large pan over medium heat. Once hot, add cumin and mustard seeds. Wait for them to sputter (about 30 seconds). Add chopped onions. Cook until translucent (5-7 minutes). Add ginger and garlic. Sauté for 2 minutes. Add turmeric powder and stir for 30 seconds. Important: Add turmeric before water as it needs to cook in oil (in any cooking). Add garam masala and other dry spices. Stir for 1 minute. Add tomatoes, potatoes and chopped green chillies. Add salt. Cook until tomatoes soften (5 minutes). Add your chosen vegetable(s). Cooking Process Monitor moisture levels: If too dry: Add small amounts of water (1/4 cup at a time) If too wet: Cook uncovered until excess water evaporates Alternate between covered and uncovered cooking: Cover to help vegetables release water and steam Uncover to reduce liquid and intensify flavors Cook until vegetables reach desired tenderness (15-20 minutes). Garnish with fresh cilantro. Pro-Tips Spice Adjustment:\nStart with less spice; you can always add more. If you want to add more spices after adding water to the curry, you can do that in the Tadka (described below) or heating the spices first in some oil. Taste frequently and adjust seasonings. You will get better with time. Tadka (Final Flavor Boost):\nHeat 2 tablespoons ghee or oil in a small pan\nAdd 2-3 dried red chilies\nAdd 1 teaspoon cumin seeds\nAdd 1/2 teaspoon red chili powder\nWhen spices sputter, pour over finished curry\nConsistency in Curry Thickness Different people have different preferences in the dryness of their curry. Some like dry curries (good for eating with bread) while others like watery curry (good with rice).\nFor dry curry: Use minimal water and cook uncovered For gravy: Add more water and simmer covered Ideal consistency: Vegetables should be well-coated with thick gravy Final Remarks and Troubleshooting Tips As long as you follow the basic recipe, there is very less chance of getting something wrong. But remember, it is okay to make mistakes and its likely you will make mistakes. But those mistakes will teach you how to cook better.\nThe most important ingredient in cooking is time. Remember not to hurry or be too late. Some people hurry up in adding water before the veggies are well fried in spices. Some delay in stirring and the vegetable burns and sticks to the bottom. In that case, add some water and then stir instead of stirring dry.\nDifferent cooking oil have slightly different taste. If you are unsure, choose refined vegetable oil. Don’t use seseme seed oil. Mustard oil will have a strong smell. You can use butter or ghee but you may need a lot; I’ll recommend pairing up with some oil if you use butter or ghee.↩︎\n","permalink":"/making-an-indian-curry/","summary":"\u003cp\u003eSome friends of mine asked me how to make homestyle Indian curries. We\nIndians are good at it; we make many types of Sabzi सब्ज़ी that\nnon-Indians simply call “curry”. Sabzi literally means vegetables.\nToday, I’m excited to share a versatile curry recipe that’s perfect for\nanyone looking to explore Indian cooking.\u003c/p\u003e\n\u003cp\u003eThe recipe is a template or a base recipe for homemade Sabzi. You can\nuse it with any vegetable of your preference or even Chicken (but\ncooking Chicken will take longer, so remember to adjust for that).\u003c/p\u003e","title":"Recipe: Making a Homestyle Indian Curry"},{"content":" Try the quiz If you want to test your knowledge about happiness around the world, try this quiz I made based on World Happiness Report 2024.\nLink to Quiz (10 questions, 5 mins activity): https://happiness-quiz.netlify.app/\nWhat makes us happy? One of the early realizations of my life was that happiness depends more on perspectives than circumstances. A wise person would strive for a perspective that brings them happiness. So, why isn’t everyone happy? To a large extent, it’s because we are told what should make us happy: job, marriage, kids, and so on.\nBut happiness isn’t just a momentary thing! Like bad experiences, sensations that bring happiness eventually change. Even if you get a nice job, a better one might make you slightly happier, but only for a short while.\nLong-term studies have shown that lottery winners and individuals who lost a limb eventually revert to their previous “normal” levels of happiness.1 Even other life events like marriages and child birth hardly affect happiness in the long term.\nMoney and Status Sociocultural factors like status and role in society do have an impact on happiness. Some claim money brings happiness, and that’s true—but only to a certain extent. Someone earning $150,000 isn’t necessarily happier than someone earning $100,000.2\nHowever, biologists insist that if happiness is measured as pleasure, it is determined by the chemical composition of balance of three biochemicals: oxytocin, dopamine, and serotonin.\nIf that’s the case, should people be drugged with the right amounts of these hormones? Thankfully, we’re saner than that—for the most part. Yet, this doesn’t stop us from doing things like putting VR headsets on cows to show them green pastures in farms. How cruel are we?3 Even if humanity achieves utopia, would it be utopia for the world? Most likely not.\nStill, all this discussion revolves around pleasure. Happiness itself remains a mystery biologically. Someone with cancer could feel happy, while a perfectly healthy person might not.\nHow do we measure happiness? Cantril Ladder The World Happiness Report ranks countries using the “Cantril Ladder,” developed by Harvard psychologist Dr. Cantril. Participants imagine a ladder with steps from 0 (worst life) to 10 (best life) and rate where they stand. In 2024, countries highest on the Cantril Ladder included Finland, Denmark, Iceland, Israel, and the Netherlands, while countries at the bottom included Afghanistan, Lebanon, and Sierra Leone. Below is the whole world on ‘Cantril Ladder’.\nThis measure of happiness correlates heavily with economic well being (as measured by GDP per capita). There is almost perfectly linear trending line.\nEven within the countries, higher income results in higher score in Cantril’s ladder — all trending upwards — as income increases, self-reported life satisfaction measured through Cantril’s ladder go up.\nBut is comparing yourself from your “best possible self” and your “worst possible self” is really the best measure of happiness?\nPositive Affect Another measure from the Gallup Poll asks, “Did you experience happiness yesterday?” (Positive Affect). It combines the answers to three Yes/No questions:\nDid you experience enjoyment during a lot of the day yesterday? Did you learn or do something interesting yesterday? Did you smile or laugh a lot yesterday? Here, Latin American countries like Paraguay, Panama, and Guatemala rank highest. Interestingly, different measures yield different results—Nordic countries often rank high for life satisfaction, while Latin American nations excel in daily happiness. At the bottom of these scales, Afghanistan has been the lowest globally since 2017.\nEven when happiness is measured through “Cantril’s Ladder”, Latin America countries are outliers.\nBut I don’t think any of the two questions measure happiness. Of course, these are two good questions but neither is perfect, because there is no perfect scale for measuring everyone’s happiness.\nWhat makes me happy? We’ve established that representing happiness, both biologically and in survey design, remains elusive.\nTherefore, I hypothesize: perspectives that make us happy require objective, unbiased, patient, and persistent observation. In other words, it requires observation of reality as it is, not as we would like it to be. A story from Buddha’s life illustrates this: On a cold winter night, Siddhartha Gautama, dressed in rags, was asked how he seemed so cheerful despite the harsh weather. Buddha replied that suffering does not come from the cold or the wind but from the mind. True contentment, he said, doesn’t depend on circumstances but on understanding the nature of the mind.\nEssentially, happiness is a mental game. Humans often misplace their pursuit of happiness by chasing fleeting external stimuli—wealth, power, relationships—believing they bring lasting joy. Buddha’s teachings take a radically different view. The core idea of anicca (impermanence) is that everything in life—emotions, sensations, possessions, and even our identities—is transient. Suffering arises when we cling to these impermanent things. Vipassana (a special way to “see”) was Buddha’s method of teaching people to directly observe their sensations and see reality as it is.\nBuddha also taught anapana (observation of respiration) as a “deliciously pleasant way of living.” So, what is happiness, if you ask me? It is the absence of suffering. If you’re not happy, you’re likely experiencing suffering. And what is peace? Happiness at rest. Happiness is closer to joy than pleasure.\nDeath, the fact that our entire life is impermanent, adds reasoning to happiness. Indeed, as Meenal says:\nWhat does death teach us about being alive? That it is not permanent, and probably a punishment if it becomes permanent. The most comfortable life imaginable will become a prison for you once you have to live it forever. There are no ups without downs. It just becomes a plain never-ending vast surface.\nStriving to know what makes me happy has always been important to me, maybe because my name Harsh (हर्ष) means happiness in Sanskrit/Hindi. Harshvardhan (हर्षवर्धन) means one who increases happiness.\nAnyway, thanks for coming to my TED Talk. Maybe you’ll decide to find what makes you happy—or maybe not, but please do. Because more likely than not, it is different from everyone else in the world. In any case, I hope we’ve learned a thing or two about happiness.\nSome Fun Facts from World Happiness Survey Younger generation (millennial and Gen Z) is more likely than their predecessors to help others in need, especially post-COVID. Financial well-being (GDP per capita), Social support, Healthy life-expectancy, Freedom to make life-choices, generosity, and perception of corruption together can explain 78% of variation in life satisfaction (Cantril ladder score, Regression’s adjusted R2 = 0.78). Older age is associated with higher life satisfaction in India, refuting some claims that the positive association between age and life satisfaction only exists in high-income nations. However, older women in India report lower life satisfaction than older men. Additional Readings How they find the “happiest” country on Earth, YouTube Happiness and Life Satisfaction by Our World in Data World Happiness Report, 2024 Brickman, P., Coates, D., \u0026amp; Janoff-Bulman, R. (1978). Lottery winners and accident victims: Is happiness relative?. Journal of personality and social psychology, 36(8), 917.↩︎\nThe actual number seems to be somewhere between $75,000 to $100,000 beyond which money becomes less important in happiness. See Does Money Buy Happiness? Here’s What the Research Says.↩︎\nRussian cows get VR headsets ‘to reduce anxiety’, BBC, November 27, 2019.↩︎\n","permalink":"/happiness/","summary":"\u003cdiv id=\"try-the-quiz\" class=\"section level2\"\u003e\n\u003ch2\u003eTry the quiz\u003c/h2\u003e\n\u003cp\u003eIf you want to test your knowledge about happiness around the world, try this quiz I made based on \u003ca href=\"https://worldhappiness.report/\"\u003eWorld Happiness Report 2024\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003eLink to Quiz (10 questions, 5 mins activity): \u003ca href=\"https://happiness-quiz.netlify.app/\" class=\"uri\"\u003ehttps://happiness-quiz.netlify.app/\u003c/a\u003e\u003c/p\u003e\n\u003chr /\u003e\n\u003c/div\u003e\n\u003cdiv id=\"what-makes-us-happy\" class=\"section level2\"\u003e\n\u003ch2\u003eWhat makes us happy?\u003c/h2\u003e\n\u003cp\u003eOne of the early realizations of my life was that happiness depends more on perspectives than circumstances.\nA wise person would strive for a perspective that brings them happiness.\nSo, why isn’t everyone happy?\nTo a large extent, it’s because we are told what should make us happy: job, marriage, kids, and so on.\u003c/p\u003e","title":"Happiness: What makes us happy?"},{"content":"I’ve been in Knoxville for more than three years now since I started my PhD in Analytics at the Haslam College of Business, University of Tennessee. During this time, I have explored a fair bit of cafes around. Not all cafes, but most cafes within 5 mile radius of the university. That covers almost every cafe in downtown Knoxville, the university area, and South Knoxville.\nI’ve had many friends and colleagues ask me about my favorite spots for coffee, so I thought it would be helpful to share my experiences. Note that I haven’t been to enough cafes in West or North Knoxville, so they aren’t part of this list.\nInitially, I thought I will rate them out of five. But that’s really hard. How do I differentiate between three 4/5 cafes?\nI’m reading The Anthropocene Reviewed by John Green where talks about the fundamental flaw of five-star scale. He aptly says:\nThe five-star scale doesn’t really exist for humans; it exists for data aggregation systems, which is why it didn’t become standard until the internet era.\nSince this post isn’t a replacement of Google Maps (or our own Map of Tiny Perfect Things), I am only going to provide you with a few lines about my favourite cafes around the area.\nAnother note: I usually get a Cortado or a Latte from these cafes.1 The reviews will reflect that. (Pro-tip: Several places can/will provide you seltzer water along with your Cortado. Using carbonated water as pallet cleanser will let you taste every sip of your coffee as the first sip. Just ask your barista if they can!)\nMy Fav Cafe: Old City Java I am currently writing this from Old City Java, a cafe in downtown Knoxville. Hands down, this place has the best Cortado in Knoxville. It also has a mini-library that you can contribute to or borrow books from.\nCortado from Old City Java, best Cortado in Knoxville. August 7, 2024.\nHonourable Mentions Mahalo Coffee Roasters for modern insides and great Cortados (as good as Old City Java). Coffee Underground for its vibe \u0026amp; inviting atmosphere on campus. They also organise open mic events often. Kelly, the owner, can often be spotted working there — which is a rarity for most businesses. They have freshly roasted coffee beans too. (I was their first customer to buy beans. Beans: 10/10.) Honeybee is great for their croissants and bakery items. But make sure you reach before 12 noon, as they run out pretty quick. Seed Coffee Co. \u0026amp; Remedy Coffee are great for spaces with larger groups and comfortable all day sitting. I also got reading/writing done many days sitting here. Golden Roast \u0026amp; Capybara have best coffee around the university campus. Coffee \u0026amp; Chocolate is for “okay” coffee but good desserts (macaroons, artisanal chocolates). It is also the only place open till 10 pm that serves actual coffee.2 Frothy Monkey has good coffee that also serves food. Also open till 9 pm. I’ve done omelettes and coffee dinners here several times. Jack’s Coffee and Plants has backdoor/balcony sitting amongst trees. They also allow pets. Awaken Coffee is a cafe that’s open till 7 pm. Good coffee as well. I’ve written parts of my dissertation here, so this place is special. Fine \u0026amp; Hoek is originally a coffee roaster, so this is where you can buy your beans from. This post is NOT sponsored by anyone.\nView my coffee collection on Are.na →\nCortado translates to cut, meaning that the coffee is cut with milk. A cortado coffee is made of equal parts espresso and steamed milk. It is often served with a double shot espresso.↩︎\nPablo, if you’re reading this: No, Urban Bar and Corner Cafe doesn’t count.↩︎\n","permalink":"/best-cafes-in-knoxville/","summary":"\u003cp\u003eI’ve been in Knoxville for more than three years now since I started my PhD in Analytics at the Haslam College of Business, University of Tennessee.\nDuring this time, I have explored a fair bit of cafes around.\nNot all cafes, but most cafes within 5 mile radius of the university.\nThat covers almost every cafe in downtown Knoxville, the university area, and South Knoxville.\u003c/p\u003e\n\u003cp\u003eI’ve had many friends and colleagues ask me about my favorite spots for coffee, so I thought it would be helpful to share my experiences.\nNote that I haven’t been to enough cafes in West or North Knoxville, so they aren’t part of this list.\u003c/p\u003e","title":"Best Cafes in Knoxville"},{"content":"I never knew I could miss a piece of tech until I found myself longing for my old Kindle. The compact 6\u0026quot; form factor that fit perfectly in my hand, easily slipped into my back pocket, and allowed me to read while walking was irreplaceable. The Kindle Scribe I upgraded to had its perks\u0026mdash;now all my books and notebooks were with me all the time, no cloud sync needed. But, it was just too big to carry around all the time.\nOn a whim, I started looking for a replacement and found a Kindle 7th generation on eBay for $30. It lacked a backlight, but its small, lightweight design made it perfect for on-the-go reading. (Though, my friends later snagged a Kindle Paperwhite for the same price, so maybe I overpaid a bit.)\nWith my new device in hand, I got curious\u0026mdash;could I hack my Kindle? Could I make it do more than it was designed to do? I dove into the world of Kindle hacks and decided to jailbreak it, following detailed instructions on the Mobileread forum. This allowed me to run Python, install apps like a hand-drawing tool, and even use a calculator on my Kindle. (\u0026ldquo;How-To\u0026rdquo; guide later.)\nAdditionally, I switched to using KOReader instead of the default Kindle reader. KOReader is a Kindle reader for nerds. It offers more features and customization options, like a book map showing where I am in a book and reading analytics.\nKOReader is a document viewer for E Ink devices. Supported fileformats include EPUB, PDF, DjVu, XPS, CBT, CBZ, FB2, PDB, TXT, HTML, RTF, CHM, DOC, MOBI and ZIP files. It\u0026rsquo;s available for Kindle, Kobo, PocketBook, Android and desktop Linux.\nFull list of KOReader features is here.\nOne of my favorite customizations was setting my own photo album as a screensaver. The default screensavers are nice, but there\u0026rsquo;s something special about seeing a random new, familiar photo every time I unlock my Kindle.\nSince it\u0026rsquo;s open-source and written in Lua, I even managed to modify it to block lockscreen ads\u0026mdash;a simple yet satisfying tweak that kept my reading experience ad-free without paying Amazon\u0026rsquo;s $20 fee to remove ads.\nNow, my Kindle feels truly mine, customized to my liking.\nMy new Kindle is powerful. I have a collection of posters rotating as screensavers. I can draw. I can run Python scripts!\nReading statistics on KOReader.\nBook maps on KOReader show each chapter\u0026rsquo;s length with titles like \u0026ldquo;Teddy Bears\u0026rdquo;, \u0026ldquo;The Internet\u0026rdquo;, \u0026ldquo;CNN\u0026rdquo;, etc. The dark bars are how much time I spent reading that section, shaded bars are my highlights, the triangle is current position. It is a great example of data visualization and it makes me happy that its my own reading habits.\nOther reading metrics, which can be analysed as daily, weekly, monthly metrics are amazing to have. Contrast that with basic Kindle where none of such reading analytics are available. Maybe something for the Amazon team?\nWhat is \u0026ldquo;jailbreaking\u0026rdquo;? Jailbreaking is a process through which a user can get \u0026ldquo;admin\u0026rdquo; access to various functions of a device. In practical terms, it gives you the ability to install apps on your Kindle. You\u0026rsquo;re not limited by what\u0026rsquo;s allowed by Amazon on this device; rather, you are only limited by their low processing power.\nUnlike Android or iPhones whose warranty gets void if you jailbreak them, and it is hard to \u0026ldquo;unjailbreak\u0026rdquo; them, jailbreaking on Kindle is quite risk-free. Jailbreaking will only install a new app called KUAL which is just an application launcher. To \u0026ldquo;unjailbreak\u0026rdquo; a Kindle, you need to factory reset the device (Settings -\u0026gt; Reset), and manually update your Kindle. Plus, you never void any warranties.\nNerdy Details: Kindle isn\u0026rsquo;t a very powerful device though. It uses a Freescale i.MX6 SoloLite processor, which is a 1 GHz single-core ARM Cortex-A9 chip. This processor is designed for low-power, high-efficiency applications, making it suitable for e-readers like the Kindle, which primarily handle text rendering and basic navigation tasks.\nTutorial: How to Jailbreak? I used WatchThis tutorial for step by step instructions on jailbreaking my Kindle. You will need a cable to connect your laptop to your Kindle and a working internet connection.\nHow to remove ads? When you buy a new Kindle, you have two options: buy a lockscreen ad-supported one or without it. Typically, there is a price difference of $20, which you can also pay later to remove the ads if you\u0026rsquo;d like. The ads aren\u0026rsquo;t intrusive; they\u0026rsquo;re engaging and typically recommend you new books based on your reading/Goodreads history.\nThere used to be a time that you could call the customer care and request them to remove the ads, which they\u0026rsquo;d gladly do. Not anymore.\nThen, I realised I can create a user patch with some Lua code and lo behold there were no ads! You should create a Lua file with the name as 2-i-m-not-special-need-no-offers.lua:\nlocal Device = require(\u0026#34;device\u0026#34;) Device.supportsScreensaver = function() return true end Device.powerd:initWakeupMgr() You can call it something else too but \u0026ldquo;2\u0026rdquo; in front is necessary. Save this file inside koreader\\patches folder. (If you don\u0026rsquo;t know what I\u0026rsquo;m talking about, know that you need to have jailbroken your Kindle AND installed KOReader already for this to work.)\nSome Book Shots Link to Are.na channel.\nSources All related resources are in this links folder.\n","permalink":"/kindle/","summary":"Jailbreaking my Kindle to run Python, draw, and block lockscreen ads.","title":"Jailbreaking: Turning My Kindle to A Juggernaut E-Reader"},{"content":"Estimating the total number of species that have ever existed on Earth is highly challenging due to the vast diversity and complexity of life. Current estimates of the number of living species vary widely, from around 3 million to over 100 million. One of the more widely cited figures is approximately 8.7 million species currently on Earth, which includes 6.5 million on land and 2.2 million in the ocean.\nBut knowing the exact number is really hard. As Robert May summarised in a paper published in Science:1\nIf some alien version of the Starship Enterprise visited Earth, what might be the visitors’ first question? I think it would be: “How many distinct life forms—species—does your planet have?” Embarrassingly, our best-guess answer would be in the range of 5 to 10 million eukaryotes (never mind the viruses and bacteria), but we could defend numbers exceeding 100 million, or as low as 3 million.\nBut most — 99.9% — of all species to ever exist aren’t alive today, i.e. are extinct. The largest extinction event, known as the Great Dying2, took place around 250 million years ago and wiped out 96% of species. Another extinction that occurred 65 million years ago eliminated about 76% of all plants and animals.\nEvery now and then, I remind people that we are all transient; that’s the nature of reality. Our current power as humans is due to sheer luck, and it won’t last forever. Indeed, we humans only evolved as Homo sapiens 200,000 years ago, with our earliest chimpanzee ancestors around 2 million years ago. (Cockroaches3 and mosquitoes4 have been alive for at least 300-325 and 217 million years, respectively. )\nNature has repeatedly wiped out species from the face of the earth—five times already, not including the ongoing sixth extinction. Let’s look at these in some detail.\nBig Five Extinction Events Even though species extinction has been a continuous event through the history, the five outliers have stuck out as a sore thumb.\nIn a landmark paper published in 1982, Jack Sepkoski and David M. Raup identified five significant geological intervals characterized by excessive diversity loss. Initially, these intervals were seen as outliers in a general trend of decreasing extinction rates during the Phanerozoic Eon. (The Phanerozoic is the current and most recent of the four main divisions of Earth’s history, spanning from 538.8 million years ago to today.)\nHowever, as more rigorous statistical tests have been applied to accumulating data, it has become clear that multicellular animal life in the Phanerozoic Eon has experienced at least five major and many minor mass extinctions.\nConsider this spiral timeline of our history: starting from the formation of moon 4510 million years ago to the birth of Hominins (group of species that includes modern humans (Homo sapiens) and our closest relatives after our lineage split from the chimpanzees).5\nThe geologic time scale, proportionally represented as a log-spiral with some major events in Earth’s history. A megaannus (Ma) represents one million (106) years. Big five mass extinctions are denoted as X.\nBy Jarred C Lloyd - Own work, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=147428651\nDuring this time, a wide variety of animal and plant life has thrived, spread, and adapted to different environments. This eon began with the Cambrian period, when animals first developed hard shells (or bones) that are well-preserved in fossils.\nLet’s see the timeline of the “Big Five” Mass Extinctions in more detail.\n“There have been five mass extinctions in Earth’s history” by Hannah Richtie (2022). Our World in Data. Source: https://ourworldindata.org/mass-extinctions\nWhat were the cause, impact, timeline and species that went extinct for each of them?\n1. Ordovician-Silurian Extinction (approx. 443 million years ago) A significant glaciation event, known as the Hirnantian glaciation, occurred towards the end of the Ordovician period. This glaciation caused a dramatic drop in global temperatures and a corresponding drop in sea levels as large amounts of water were trapped in ice sheets. The resulting habitat loss in shallow marine environments led to widespread species extinctions.\nThis cooling also resulted in widespread oceanic and atmospheric circulation patterns. Reduced oxygen levels in the oceans, a condition known as anoxia, may also have occurred due to changes in ocean circulation and productivity. Anoxic conditions can be deadly for many marine organisms, particularly those in deeper waters.\nHere’s another wild hypothesis. Many scientists say that the initial extinctions might have been sparked by a gamma-ray burst from a hypernova in a nearby arm of the Milky Way, just 6,000 light-years away. A ten-second burst destroyed half of Earth’s ozone layer6, leaving surface-dwelling organisms, including the ones keeping our planet green with photosynthesis, exposed to intense ultraviolet radiation.\nCauses: Severe ice age, falling sea levels, possible gamma-ray burst; followed by glaciers melting and rapid warming phase. Impact: About 85% of marine species went extinct. Examples of Extinct Species: Many brachiopods, bryozoans, and trilobites. Timeline: Occurred over a period of about 1 million years. 2. Late Devonian Extinction (approx. 375-359 million years ago) By the Late Devonian, the land had been colonized by plants and insects. In the oceans, massive reefs were built by corals. The continents were arranged quite differently than they are today. Gondwana, a supercontinent, dominated the Southern Hemisphere. In the Northern Hemisphere, the continent of Siberia held its position, while Laurussia—formed from the collision of Baltica and Laurentia—drifted towards Gondwana, gradually closing the Rheic Ocean. The Caledonian mountains were rising in what is now the Scottish Highlands and Scandinavia, while the Appalachians were forming in America.\nThis was the period that killed most of the coral reefs, which later evolved again a few hundred million years later. This mass extinction was a two-pulsed event: the two extinction pulses being separated by an interval of approximately 800,000 years. The second pulse was more severe than the first.\nCauses: Possible asteroid impacts, climate change, widespread ocean anoxia. Impact: Around 75% of species, particularly affecting marine life. Examples of Extinct Species: Numerous fish species and coral reefs. Timeline: Spanned several million years, with multiple extinction pulses. 3. Permian-Triassic Extinction (approx. 252 million years ago) The most severe of the mass extinctions gave this period the name: The Great Dying. During this phase, Earth’s average temperature increased by 8 °C (14 °F) and and an increase in CO2 levels to 2,500 ppm (for comparison, the concentration immediately before the Industrial Revolution was 280 ppm and now its about 415 ppm). It is also associated with sharp increase in the abundance of marine and terrestrial fungi, caused by the sharp increase in the amount of dead plants and animals fed upon by the fungi.\nThis extinction period is also marked by the absence of coal — all coal-forming plants were likely killed and it tool another ten million years for a new suite of plants to form peat. The most significant cause is volcanic eruptions in Siberian Traps, which lead to one of the most rapid rises of atmospheric carbon dioxide levels in the geologic record. This lead to global warming: surface land temperature increased by 17 °C, surface water temperature increased by 8 °C, and air temperature increased by around 12 °C. They were also trapping halogens, extremely destructive to ozone; 70% of trapped halogens was released in atmosphere.\nCauses: Volcanic activity in the Siberian Traps, climate change, ocean acidification, anoxia. Impact: Largest extinction event, with about 96% of marine species and 70% of terrestrial vertebrates going extinct. Examples of Extinct Species: Trilobites, large amphibians, many types of reef-building organisms. Timeline: Occurred over a period of about 60,000 years to a few hundred thousand years. 4. Triassic-Jurassic Extinction (approx. 201 million years ago) Towards the end of the Triassic period, the fourth mass extinction took place marking the boundary between the Triassic and Jurassic periods. The primary cause was massive volcanic activity associated with the Central Atlantic Magmatic Province (CAMP). Again, these events led to increase in atmospheric CO2, acidification of ocean, anoxia and changes in water currents. CAMP also increased the amount of toxic mercury, killing many by mercury poisoning.\nThis extinction also cleared a lot of land space, paving the way for dinosaurs to be the apex predators of the planet. Some species of dinosaurs went extinct, but most lived on to be at the top of the food chain.\nThis extinction event is also quite similar in nature to Anthropocene extinction. If human-induced climate changes — increased carbon dioxide levels, ocean acidification, and ocean deoxygenation — persists as is, predictions can be made as to how various aspects of the biosphere will respond based on these records.\nCauses: Likely volcanic activity, climate change, and rising sea levels. Impact: Around 80% of species went extinct. Examples of Extinct Species: Many marine reptiles, large amphibians, certain genera of early dinosaurs. Timeline: Occurred over a period of less than 10,000 to several tens of thousands of years. 5. Cretaceous-Paleogene Extinction (approx. 66 million years ago) The fifth mass extinction was triggered by a massive asteroid impact in Yucatán Peninsula, present day southeast Mexico, creating Chicxulub crater, named after the Mexican town where it was found (pictured below). This impact released enormous amounts of energy, causing fires, tsunamis, and a “nuclear winter” effect, where debris blocked sunlight, halting photosynthesis and drastically altering the climate. Oceans became significantly acidic which, obviously, affected marine life. Around the same time, volcanic eruptions in Deccan region of present day India also led to significant changes in global climate.\nThis mass extinction resulted in annihilation of approximately 75% of world’s known species, including most dinosaurs, marine reptiles, and more. The extinction severely affected marine species, particularly those dependent on phytoplankton. Only about 13% of species that relied on marine phytoplankton survived. On land, the extinction was catastrophic for large vertebrates, including all non-avian dinosaurs. Birds, the only dinosaur lineage to survive, and mammals began to diversify and evolve rapidly in the aftermath.\nHowever, the extinction event also opened up evolutionary opportunities, leading to significant adaptive radiation. In its aftermath, many groups rapidly diverged into new forms and species to fill the newly available ecological niches. Mammals, in particular, underwent extensive diversification, evolving into various forms including horses, whales, bats, and primates. The surviving dinosaurs were primarily those capable of flight, which evolved into the modern species of birds.\nCauses: Asteroid impact (Chicxulub crater), volcanic activity in the Deccan Traps, climate change. Impact: About 75% of species, including all non-avian dinosaurs, went extinct. Examples of Extinct Species: Tyrannosaurus rex, Triceratops, various marine reptiles. Timeline: Occurred rapidly, possibly within a few years to a few decades. 6. Holocene/Anthropocene Extinction (current) We are currently, in a systematic manner, exterminating all non-human living beings.\n— Anne Larigauderie, IPBES, United Nations\nHolocene or Anthropocene mass extinction is the ongoing mass extinction (as I write!), caused by humans alone. Our activities have resulted in widespread degradation of biodiversity hotspots such as coral reefs and rainforests. E.g., coral reefs cover 0.1% of ocean floor but house over 25% of all marine species. The current rate of extinction is estimated to be 100 to 1000 times more than the natural background extinction rate. More species are going extinct in Asia than elsewhere.7\nThis likely started with megafauna (large mammals) extinction around 50,000 years ago, which continued till around 12,000 years ago when wolly mammoth became extinct. Human hunting and competition from other mammals were a large reason for their degrowth.\nLike all facts that haven’t become history yet, this ongoing extinction is still questioned by some scientists. I don’t think you need convincing that we humans have irrevocably changed the ecology of the planet, but if you still doubt it, here’s a lot more evidence.\n::: {style=” padding: 40px; display: flex; flex-direction: column; align-items: center;“} ::: {style=” display:block; height:32px; width:32px; padding-bottom:20px;“} :::\n::: {style=” font-size: 15px; line-height: 21px; color: #999999; font-weight: 400; padding-bottom: 4px; “} Post by @harsh17.in :::\n::: {style=” font-size: 15px; line-height: 21px; color: #000000; font-weight: 600; “} View on Threads ::: :::\nCauses: Human activities (habitat destruction, pollution, overhunting, climate change, invasive species). Impact: Estimated that species are going extinct at 100 to 1,000 times the natural background rate, with significant declines in biodiversity. The International Union for Conservation of Nature (IUCN) reports that over 26,000 species are threatened with extinction. Examples of Extinct Species: Passenger pigeon, Western black rhinoceros, various amphibians. Timeline: Ongoing, with accelerated rates of extinction over the past few centuries. The current rate suggests that around 1 million species could face extinction in the coming decades. Concluding Thoughts You may notice that each extinction is happening more quickly than before: first and second took over a million years, third one 60,000 years, fourth one 10,000 years, fifth one only a few decades. Will the sixth (ongoing) will kill \u0026gt;80% species in just a few years?\nThe other interesting thing is that nature has routinely killed over 70% species in every mass extinction. Indeed, over 99% of all species (not organisms but individual species) have gone extinct over the course of Earth’s history. But we (still) have so much diversity in life today. What would the world be like in a few million years?\nIt has been well said that forests precede mankind; deserts follow. We love to say “save the planet” but actually, we are striving to “save ourselves”. In the words of Lester Brown, “We have not inherited this earth from our forefathers; we have borrowed it from out children”.8\nView my Are.na channel on Economics et al. →\nMay, R. M. (2010). Tropical arthropod species, more or less?. Science, 329 (5987), 41-42.↩︎\nNASA has a great explainer on the Great Dying. The causes aren’t clear to us. Scientists have suggested many possible causes for the Great Dying: severe volcanism, a nearby supernova, environmental changes wrought by the formation of a super-continent, the devastating impact of a large asteroid — or some combination of these (my guess too). We are able to learn more about this thanks to cosmic gas trapped.↩︎\nCockroaches date back to the Carboniferous period, approximately 300-325 million years ago. This period is marked by the oldest putative fossil evidence of stem-dictyoptera, which includes cockroaches (Legendre et al., 2015).↩︎\nMosquitoes are an ancient group – around 217 million years old – that probably originated in South America before it was South America, on one big land mass called Gondwana that hadn’t yet split apart. Source.↩︎\nThis group includes all species on our branch of the evolutionary tree after this split, encompassing extinct species such as Homo neanderthalensis (Neanderthals), Homo erectus, and Australopithecus species. Hominins are characterized by traits such as bipedalism (walking on two legs), larger brain sizes relative to body size, and more complex tool use compared to other primates.↩︎\nOzone (O3) + Gamma Ray = Oxygen (O2) + Oxygen Free Radical (O+)↩︎\nPimm SL, Jenkins CN, Abell R, Brooks TM, Gittleman JL, Joppa LN, Raven PH, Roberts CM, Sexton JO (30 May 2014). “The biodiversity of species and their rates of extinction, distribution, and protection”↩︎\nMy first introduction to persuasive arguments, not scientific but gut wrenching and convincing, on climate change was Nani Palkhivala’s “The Ailing Planet: The Green Movement’s Role”. It was part of my NCERT textbook in Class 11, though originally published in The Indian Express.↩︎\n","permalink":"/the-ongoing-sixth-mass-extinction/","summary":"\u003cp\u003eEstimating the total number of species that have ever existed on Earth is highly challenging due to the vast diversity and complexity of life.\nCurrent estimates of the number of living species vary widely, from around 3 million to over 100 million.\nOne of the more \u003ca href=\"https://ourworldindata.org/how-many-species-are-there\"\u003ewidely cited figures\u003c/a\u003e is approximately 8.7 million species currently on Earth, which includes 6.5 million on land and 2.2 million in the ocean.\u003c/p\u003e\n\u003cp\u003eBut knowing the exact number is \u003ca href=\"https://ourworldindata.org/how-many-species-are-there\"\u003ereally hard\u003c/a\u003e.\nAs Robert May summarised in a paper published in \u003cem\u003eScience\u003c/em\u003e:\u003ca href=\"#fn1\" class=\"footnote-ref\" id=\"fnref1\"\u003e\u003csup\u003e1\u003c/sup\u003e\u003c/a\u003e\u003c/p\u003e","title":"The Ongoing Sixth Mass Extinction"},{"content":"\nTL;DR: I created my digital avatar chatbot and you can talk to him at https://harsh17.in/chat.\nIntroducing My Chatbot: A New Way to Learn About My Work I\u0026rsquo;m excited to share a new project I\u0026rsquo;ve been working on: a custom chatbot designed to help you learn more about my work, research, and interests. As someone deeply involved in the field of Business Analytics and Statistics, I often get questions about my projects, research areas, and professional background. To make the process of knowing me fun and accessible, I created a chatbot that can provide detailed answers about my work and my interests.\nWhy a Chatbot? A chatbot serves as an ideal medium to answer queries in real-time, providing instant responses based on a wide range of topics related to my academic and professional journey. From discussing my dissertation on predictive optimization frameworks to answering questions by looking up my blog, this chatbot can be quite handy.\nKey Features It is completely free to use. You only need to have a ChatGPT account (free or premium). It can search my work (blog, newsletter, etc.) online to answer your question. Hopefully, it should do that automatically by \u0026ldquo;learning\u0026rdquo; when to do that. If not, please suggest it to do that.\nLooking up blogs is specially useful when asking personal questions, like \u0026ldquo;What does Harsh think about how this AI thing is going to shape up?\u0026rdquo;\nHow It Works The chatbot leverages GPT-4o to understand and respond to queries. By accessing structured information and contextual data from my CV, Resume, and blogs from the internet, it can provide accurate and detailed responses. Whether you\u0026rsquo;re curious about my latest research paper or want to know more about a specific project, the chatbot is here to help.\nUsual caveats with AI tools apply. It can make up information, give wrong information, etc. If it says something crazy, it is very likely I didn\u0026rsquo;t say that. It will likely provide you the source to the original work from me, please follow the source.\nTry It Out You can talk to him at https://harsh17.in/chat. Any feedback is welcome. You can email me at hello@harsh17.in or fill this contact form.\n","permalink":"/talk-to-harshvardhan/","summary":"I created my digital avatar chatbot that you can talk to for free at \u003ca href=\"https://harsh17.in/chat\"\u003ehttps://harsh17.in/chat\u003c/a\u003e","title":"Talk to Harshvardhan"},{"content":" All the ones marked * are freely available at the link.\nThis will be updated over time.\nThe following is the grand list of recommendations from me. Currently, it has reading, listening, and watching recommendations. It may be updated with other recs, but I can’t promise. The last time I posted a Rec-List was three years ago on my father’s birthday.\nRead 📑 Short Reads Short Stories Collections\nAxiomatic by Greg Egan (Well-thought-out Sci-Fi stories) Stories of Your Life and Others by Ted Chiang1 (Anything written by Ted Chiang must be read) Exhalation: Stories by Ted Chiang Land of Big Numbers: Stories by Te-Ping Chen The Very Best of R.K. Narayan by R.K. Narayan Short Stories\nThe Red Convertible by Louise Erdrich* The Elephant Vanishes by Haruki Murakami प्रेमचंद के फटे जूते by हरिशंकर परसाई Thoughtful Non-fiction\nMeditations on Moloch by Scott Alexander* Travelling at the Speed of the Soul by Nick Hunt* The Art of Chicken Sexing by Richard Horsey* Hindi / Urdu / Hindustani2\nबकर पुराण by अजीत भारती Short Stories by Saadat Hasan Manto* Poems If by Rudyard Kipling* Geetanjali by Rabindranath Tagore* Howl by Allen Ginsberg* (Especially this part.) I’ve an Are.na channel for this. Long Reads Story Books (Fiction/Historical Fiction)\nCuckold by Kiran Nagarkar The Sense of an Ending by Julian Barnes The Humans by Matt Haig Kafka on The Shore by Haruki Murakami Non-Fiction\nCity of Djinns by William Dalrymple Wisdom\nPanchatantra by Vishnusharma (Translated by Nilanjana S. Roy) 21 Lessons for the 21st Century by Yuval Noah Harari Dhammapada (Buddha’s Poems, Translated by Gil Fronsdal) Audio-version The Art of Living (on Vipassana) by William Hart* The Almanack of Naval Ravikant by Eric Jorgenson* The Richest Man in Babylon by George S. Clason Listen 🎧 Podcasts The Empire podcast covers the British Empire, Roman Empire, Ashoka’s Empire, etc., and their impact on India and Asia. The episodes are phenomenal, detailed, and cover aspects of history that I had no clue about! I’ve written about Warren Hastings and Robert Clive based on what I listened to here. I’d recommend the first seven episodes in a row about the East India Company, 1857 rebellion, Gandhi3, Jinnah (Founding Father of Pakistan), Mountbatten (Last Viceroy of India), and Partition of India and birth of Pakistan.\nAudiobooks Gilgamesh’s Audiobook translated by Stephen Mitchell (an excellent translator by the way4) and recited by George Guidall is so good.\nWatch 📽️ Series Apple TV+\nSeverance (What if you separated your work half from the other half?) Prehistoric Planet (An unreal documentary on dinosaurs) Extrapolations (What if we breach the 1.5 °C temperature goal, imagined from NYC, SF, Mumbai, and more) Netflix\nThree Body Problem (A very good Sci-Fi series based on a Chinese novel) Thermae Romae Novae (A Roman bath architect time-travels to modern Japan for inspiration, an anime) Link to this list: https://blog.harsh17.in/read-listen-watch\nIf you liked this post and would like to stay updated, share your email here. Promise, no spam/bots.\nThe movie “Arrival” was based on his short story “Story of Your Life”. You might be able to find individual stories if you’d like. “The Merchant and the Alchemist’s Gate”, “Hell Is the Absence of God”, “Tower of Babylon”, and “Understand” are great starts. “Understand” is my favorite.↩︎\nI started collecting some Hindustani vocabulary that might interest you: https://www.are.na/harsh/hindustani-hindi-urdu-vocab↩︎\nOn a sidenote, I’ve never respected anyone more than Mahatma Gandhi. No one has valued truth and the personal search for it through experiments like him. No one before him, and no one since him.↩︎\nSomeone says “all translations are opinionated”. Then I guess what I’m saying is our opinions match.↩︎\n","permalink":"/read-listen-watch/","summary":"\u003chr /\u003e\n\u003cp\u003eAll the ones marked * are freely available at the link.\u003c/p\u003e\n\u003cp\u003eThis will be updated over time.\u003c/p\u003e\n\u003chr /\u003e\n\u003cp\u003eThe following is the grand list of recommendations from me.\nCurrently, it has reading, listening, and watching recommendations.\nIt \u003cem\u003emay\u003c/em\u003e be updated with other recs, but I can’t promise.\nThe last time I posted a \u003ca href=\"https://blog.harsh17.in/what-i-find-interesting/\"\u003eRec-List\u003c/a\u003e was three years ago on my father’s birthday.\u003c/p\u003e\n\u003cdiv id=\"read\" class=\"section level1\"\u003e\n\u003ch1\u003eRead 📑\u003c/h1\u003e\n\u003cdiv id=\"short-reads\" class=\"section level2\"\u003e\n\u003ch2\u003eShort Reads\u003c/h2\u003e\n\u003cp\u003e\u003cstrong\u003eShort Stories Collections\u003c/strong\u003e\u003c/p\u003e","title":"Recommendations: List of Things to Read, Listen and Watch"},{"content":"Kaj Sotala writes:\nUsually, most of us are - on some implicit level - operating off a belief that we need to experience pleasant feelings and need to avoid experiencing unpleasant feelings. In a sense, thinking about getting into an unpleasant or painful situation may feel almost like death: if we think that the experience would be unpleasant enough, then no matter how brief it might be, we might do almost anything to avoid ending up there.\nThere’s a sense in which this is absurd. After all, a moment of discomfort is just that - a moment of discomfort. By itself, it won’t do us any lasting damage, and trying to avoid can produce worse results even on its own terms.\nIndeed, the attempt to avoid unpleasant feelings might actually cause more damage than the actual event we’re trying to avoid. Similarly, the anticipation of pleasant feelings might actually be more attractive than the actual act which we’re craving.\nTo avoid living in future, the wise people recommend to live in the moment — in the present moment. No one really teaches how; just lectures how that it’d be a wise decision.\nObservation of Present to Live in The Moment A simple to state but hard to follow method to live in the present is to observe the present moment, whatever is happening in the present moment. The object of observation could be anything as long as it is occurring in the present moment. Watching a river flow, listening to the wind, working on a tough problem, etc.\nOne such event that is constantly happening at any given moment is our breath, respiration. We inhale oxygenated air and exhale carbonised air. Respiration is a process that’s always with us as long as we are alive. (The origin of the phrase “As long as I breathe”.)\nTherefore, respiration is one process that will always be with us, in the moment, no matter what.\nWhy Observe Respiration? There are two types of nervous systems in humans: autonomic nervous system (involuntary) which function without our explicit instructions and somatic nervous system (voluntary) which function with our explicit instructions. Most nervous system functions fall into one of these two categories. However, there is a third category, sort of like a bridge between the two which is involuntary but can also be controlled by us.\nBreathing is a primary example: we can breathe fast or slow, deep or shallow, but even if we don’t make any effort, we continue to breathe. Other examples include blinking, swallowing, etc.1 Additionally, how we breathe is highly dependent on our state of mind at any moment. If we are angry, our breath would be faster. If we are afraid, our breath would be erratic.\n\u0026gt; Affective states are shown here alongside the dimension of respiratory rhythm. Source: Jerath, R., \u0026amp; Beveridge, C. (2020).\nTherefore, observing breath offers a direct view into our mind, which is critical for introspection.\nWhat is Mind? At this point, let’s take a brief look at what is the mind. (Spoiler: it is more than the brain or the heart.) A human mind, called Manas (मानस or मन) in Hindi/Sanskrit/Pali, is a complex interplay between mental processes, consciousness, thoughts, perceptions, emotions, and self-awareness. It is intangible, not centered in any specific organ but utilizes sense organs, brain, and quite possibly our entire body to work.\n(Fun fact: as a kid, if you asked me where is “mind”, I’d point to my heart. Somehow, I always associated Manas मन with heart. But now that my thoughts are more in English than Hindi, I associate it with my “brain”. In both cases, I was/am wrong.)\nBuddha spoke extensively about the importance of the Mind in Dhammapada, Chapter 3.\nMind precedes all mental states. Mind is their chief; they are all mind-wrought. If with an impure mind a person speaks or acts, suffering follows him like the wheel that follows the foot of the ox. If with a pure mind a person speaks or acts, happiness follows him like his never-departing shadow.\nHe broke down the whole of mind into four parts that I’ve written about before: Sangya (Perception), Smriti (Memory), Vedna (Sensation or Feeling) and Sanskar (Mental conditionings). Sangya includes our sense organs: eyes, ears, nose, tongue, body/skin, and mind.2 Sangya perceives the information available in the world (sense-objects) and calls to action Smriti. Smriti is our memory bank. It pulls out every single instance that we have related to the perceived sense object and passes both the present information and the historical data about it to Vedna. Vedna, based on both inputs decides if the present input is pleasant, unpleasant, or neutral. In either case, it manifests its decisions through physical bodily sensations. Finally, Sanskar raises up its head and creates a near permanent memory the sensations and the input. If the sensations were pleasant, it wants to continue having the input. If they were unpleasant, it wants to avoid having it completely.\nObserving Breath, Bare Breath, Nothing but The Breath Therefore, Buddha suggested everyone to observe their respiration (Anapana in Pali) as the basis for all awareness of the self, awareness of the mind, and awareness of the mental contents.\nSome meditation teachers started adding things to the breath: visualize a symbol (like Om ॐ )\nor a figure of god, or verbalize something like a Mantra, or sometimes as “neutral” as Inhale/Exhale while observing breath.\nAdding things to observing breath certainly makes the task of observing bare breath easy. It helps gain the concentration of mind but at a cost. However, it takes away the power of breath to observe mental contents. You can’t observe your breath change as mental content changes.\nIntrospecting the mental content is critical if we truly want to be happy, in the long term. Indeed, Buddha gave a three-pronged formulation (“Noble Eight-fold Path”) on how to be happy (a.k.a. how to end your suffering). Shila or Morality (शील), Samadhi or Concentration of Mind (समाधि) and Panna or Purification of Mind (प्रज्ञा).\nLiving a moral life, however one defines it, would lead to a happy life. Having mastery over one’s mind would enable one to do what they want to do without getting distracted by the illusions and delusions of the world (Mara of Samsara). However to be totally happy, one needs to achieve a purity of mind, free from all defilement and with the right wisdom.\nWhat are defilements and what is the wisdom? Buddha said there are three basic defilement: craving, aversion and ignorance. These defilement keep us trapped in the cycle of suffering. We might be momentarily be happy — living life like gods but then lose because we crave something else, or don’t like something that we have.\nPeople often realize the loss due to cravings and aversion, but a little too late, like Alexander at his deathbed. Why are they so late in this realization? Ignorance.\nWhen Alexander The Great died, he made a peculiar request about his funeral possesion. First, his body should be carried in the open casket to show even after all his wins, he remained a human who died. Second, his team of doctors should lead the march to mean even the best doctors couldn’t keep him from death. Third, his hands should be hanging out of his casket to show ultimate helplessness and the futility of his lifelong quest for power and wealth.\nComing out from his ignorance, Buddha realized craving and aversion were keeping him from true and lasting happiness. But how to get rid of craving and aversion?\nAvoiding Craving and Aversions So many wise people have said one should avoid cravings and aversion to be truly happy. Here are some samples.\nKrishna in Bhagwat Geeta (Chapter 2):\nThe man who is self-controlled,\nwho meets the objects of senses\nwith neither craving nor aversion,\nwill attain serenity at last.\nMarcus Aurelius:\nIf you are pained by any external thing,\nit is not this thing that disturbs you,\nbut your own judgment [craving about the outcome] about it.\nAnd it is in your power to wipe out this judgment now.\nLao Tzu (Tao Te Ching, Chapter 19)):\nManifest plainness, embrace simplicity,\nreduce selfishness, have few desires.\nConfucius (Analects, Book VII, Chapter 2):\nThe Master said, ’I desire not to desire.\nJesus Christ:\nDo not store up for yourselves treasures on earth, where moth and rust destroy, and where thieves break in and steal. But store up for yourselves treasures in heaven… For where your treasure is, there your heart will be also.\nRumi:\nWhen you let go of who you are, you become who you might be.\nFinally, Buddha:\nDefault of existence is suffering,\nCraving is the cause of suffering,\nEnd of suffering comes from the end of craving,\nMorality, concentration and wisdom end craving, and thus suffering.\nHow to achieve end of craving and aversion? All the aforementioned wise told us to end our suffering by ending our craving and aversion. However, they assumed reading the wise words would change their mind. It does help but not always and not completely.\nFor example, Hindu sages aimed to end their suffering by detaching themselves from the sense objects directly — undertaking austerities like avoiding delicious (sometimes any) food. Jesus promised a seat in heaven. Jains from avoiding all things that might arise passion in one. Krishna suggests “Nishkama Karma Yoga”, or self-less or desireless action, performed without any expectation of fruits or results.\nBut all of them were promises written in books, that (sometimes) made logical sense. That’s not enough.\nBuddha stated the limitations of wisdom from books and scriptures (sutta-maya-panna): their authenticity and verification is always doubtful. Wisdom from intellectual and logical understanding (chinta-maya-panna) isn’t enough either: logical conclusions aren’t enough to change one’s mind completely. Only experiential wisdom (bhavna-maya-panna) how craving and aversion make us suffer can make one’s understanding authentic and unshakeable.\nHe says (Kalama Sutta):\nDo not go by reports, by legends, by traditions, by scripture, by logical conjecture, by inference, by analogies, by agreement through pondering views, by probability, or by the thought, ‘This wise person is our teacher.’\nWhen you know for yourselves that, ‘These qualities are skillful; these qualities are blameless; these qualities are wise to cultivate; these qualities, when adopted and carried out, lead to my welfare and to happiness’ — then and only then you should accept them.\nSo, what did Buddha find from his experience?\nWhen the six sense doors, that is, the six sense organs come in contact with their respective objects (saḷāyatana paccayā phasso) a sensation arises in the body (phassa paccayā vedanā) and when the sensation is experienced, craving (taṇhā) arises (vedanā paccayā taṇhā).\nThere: Buddha discovered the missing link — sensations. Instead of avoiding the world’s pleasures, one should be unperturbed to sensations that result from them. Then, and only then, one can be indifferent to the sense objects.\nVipassana: Insight to Happiness Vipassana which literally means “to see things as they are” is the essence of Buddha’s teaching: how to get rid of suffering.\nLeading a moral life, one develops a strong foundation to concentrate mind. With the observation of one’s breath — Anapana Meditation — one calms their mind and gains the one-pointedness. This ability to concentrate is necessary condition to see the reality of craving, aversion and ignorance, to develop wisdom.\nVipassana Meditation as taught by S.N. Goenka in a ten day residential program is a critical gateway to understand all the above wisdom at the experiential level. Reading this blog (or any essay, scriptures) or even logically coming to this conclusion, wouldn’t make you free from craving and aversion at the deepest levels of mind.\nOne has to experience the truth of one’s existence — dukkha (suffering), annica (impermanence)and annata (no-self) to walk towards happiness. While sitting for ten hours a day for ten days, one very clearly sees the truth of suffering as the default. There is some or other pain in the leg. You will be distracted by your thoughts about your past experiences, many will show how you suffered because you craved for something. The reality of suffering becomes a truth, realized within one’s own body and mind.\nAlongside, another truth that becomes obvious is the changing nature of everything. Every sensation that comes up, goes away sooner or later. All painful sensations from sitting long hours pass away eventually. The changing nature of oneself shows how everything around them is changing as well. The changing nature isn’t just of one’s own body but of everything in the world — thoughts, buildings, civilizations, culture, trends, people — everything. All temporal things, whether material or mental, are compounded objects in a continuous change of condition, subject to decline and destruction.\nDuring the meditation, your mind will get distracted. And that’s okay. Your mind won’t listen to you until you listen to it. It needs to untie all knots and finish all open thoughts. The introspection through distraction while you train your mind to concentrate is critical. Every time you bring your attention back to breath or sensation, you are winning over your mind.\nThe final truth — annata or no-self — the realization that the entire body of existence, our mind as well as our body, is just our senses acting on sense objects. It is one of the hardest truths to realize but also the most important one.\nFree from Conditioning During Vipassana course, you’re under Noble Silence. You’re not allowed to speak with anyone, except to ask questions, etc.\nFrom my own experience, I had realized the importance of silence breaks (मौन - व्रत). I could concentrate better, get less distracted and solve harder problems. Taking cues from my friend Ehtesham Izhar who would take silence breaks during Ramdan to study all day and inspired from Mahatma Gandhi who remained silent all day on Mondays, I tried it with impeccable results. Even today, I sometimes do it for hours and it works great to kill distractions.\nWith most outside distractions gone, when you sit down to observe your sensations during the meditation you can sense arising and passing of the sensations. A conditioning of aversion would often manifest itself as gross sensations like pain. A conditioning of craving would show up as subtle sensations like tingling.\nAt all steps, the most important requirement is to be equanimous, i.e. to observe the sensations (and thus the reality of now) exactly as it is. When you observe the reality (of sensations or respiration) within yourself “as it is” (यथाभूत), objectively, without reaction, you develop resilience over your mind’s past conditioning.\nSlowly and gradually, each conditioning bubbles up as sensations and then passes away as you remain non-reactive. You become better at taking actions based on the input, not through blind reactions based on unconscious conditionings.\nAwareness and Equanimity “Awareness and Equanimity are like two wheels of a cart. Neither can be larger or smaller than the other. Both are equally important for moving straight on this path.”, S.N. Goenka says.\nIn my previous course, I was blown away with my experience of awareness. I was surprised by how little I knew about my own breath and my own mind. In this course, I realized the transient nature of all things — how the sensations passed just like that. As a silent observer, I developed my tool of equanimity.\nAround Day 6 or so, I applied some bug spray before walking around to avoid mosquitoes and no-see-ums kisses all around. It worked great for an hour and then the bugs came back with even stronger force (impermanence!). From then on, I thought to equanimously observe the sensations of these bugs landing on me instead of trying to change the reality of them coming at me.\nThat day, I made most progress on making myself equanimous. I was able to be non-reactive to circumstances outside.\nSince the last course, I hadn’t kept up my regular meditation practice. But now, I am sure I need to keep up the practice. Else, this experiential wisdom will reduce to logical and bookish wisdom.\nConclusion Buddha says meditation is the only way to taste the fruit of Dhamma (Laws of Nature). My second experience made me realize the transient nature of everything, including my own experience and myself. I realized how I need to be equanimous to my own sensations to act objectively in the world. Most importantly, I realized I cannot miss my meditation sittings. Else, it will take backseat as intellectual game; the process of changing me for permanent happiness will stop.\nFurthermore, I only realize that only I can redeem myself — I’m the sole cause of my happiness and my sadness. “Attā Hi Attanō Nāthō”, Buddha said, which means “I am my own master”.\nCorrection The first version of this article said it was Napolean Bonaparte who had the special requests for his funeral. It was actually Alexander The Great, the article has been updated accordingly. (Thanks Nikhil!)\nSome are even able to control heart rate voluntarily, depending on practice. It may be possible to control all “involuntary” processes, only if we are aware enough. May be, not sure.↩︎\nSome people have trouble accepting of mind as a sense organ. Here mind refers to the sense organ that senses thoughts, like thoughts of fear, anger, love, etc.↩︎\n","permalink":"/vipassana2/","summary":"\u003cp\u003e\u003ca href=\"https://www.lesswrong.com/posts/mELQFMi9egPn5EAjK/my-attempt-to-explain-looking-insight-meditation-and\"\u003eKaj Sotala\u003c/a\u003e writes:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eUsually, most of us are - on some implicit level - operating off a belief that we \u003cstrong\u003eneed\u003c/strong\u003e to experience pleasant feelings and need to avoid experiencing unpleasant feelings.\nIn a sense, thinking about getting into an unpleasant or painful situation may feel almost like death: if we think that the experience would be unpleasant enough, then no matter how brief it might be, we might do almost anything to avoid ending up there.\u003c/p\u003e","title":"Thoughts about Vipassana Meditation (and My Second Experience)"},{"content":" Like many blog posts, this one started with an idea conveyed in a Tweet. A Tweet where I vented how the internet has changed so much that the art of exploration — called surfing because riding on the waves of information — wasn’t as fun anymore. There are several reasons why I don’t enjoy it as much as the early days of internet but here are a few of my guesses:\nNovelty Effect First time exploring anything is fun simply due to the lack of knowledge about the subject. When I was faking my date of birth to be over 18 years to sign up for Orkut, was busy downloading random wallpapers from SantaBanta.com (which has degraded to explicit images now), getting Bollywood songs from songs.pk, I was doing things that were novel to me. I experienced excitement and heightened engagement when I encountered something new or unfamiliar. Now, most websites are somewhat familiar and this effect is gone.\nQuality of internet is down In the old days, I could open a website and browse most of its content without being asked to create an account, add my credit card, etc. Now, every site asks me to create an account to read on, some even ask credit card details. Much worse experience.\nThe ads used to be on the sides of the pages and popups (which were annoying af). But still, my mind would automatically ignore the ads and focus on the content. Today, I use AdBlockers so this is not much of an issue but when I do use a computer without AdBlocker, I’m surprised to see 2/5 results on Google are Ads, 3/5 “tweets” are ads, 2/5 posts on Reddit are ads.\nHow much more money do these capitalists want from me? Isn’t a good user experience worth something more than a few `$$$`?\nSocial media sucks badly I’ve relied too much on Instagram, Twitter and Reddit to keep me updated on what’s going on. However, they seem to be full of noisy, useless content, and hardly ever useful things.\nI’ve relied too much on Google to be my window to the internet Many. have. debated. what’s. wrong. with. search.\nSome have even directly called out Prabhakar Raghavan, a computer scientist with management consultancy background, to be the core problem.\nThere is ample evidence that Google Search isn’t what it used to be. I understand search is a hard problem to solve — constantly avoiding SEO-optimized posts in favour of quality posts — but these days, I feel Google tries too hard to understand what I’m looking, instead of serving me what I ask.\nThere are several good alternatives to a search engine, but I need to remember to use them:\nPerplexity: A great alternative that presents the answer directly with footnote citations. Marginalia.nu: It shows me sites that I wouldn’t have seen otherwise at all. This is the only one that directly aids in exploration. (It also makes me wonder how can a search engine made by a single person be so good?) Exa.ai: A search engine that tries to “understand” the context around my keyword. This is what Google circa 2024 wants to be, though I don’t think this is a good North Star. However, the cool part of this tool is the ability to refine my search to PDFs, companies, research papers, tweets, and even personal websites. Gigabrain: A Reddit search engine that finds answers by searching only Reddit data. Long live “site:reddit.com”! udm14: When Google launched all the AI features including overviews recently, it also added a neat code for a much cleaner Google, as it used to be earlier. (Hat-tip to Meenal for suggesting these.) I have hot keys set up in Arc for all of these site searches, which you can do that for most browsers, including Chrome.\nDanluu.com compares search results from Google, Bing, Marginalia, Kagi, Mwmbl, and ChatGPT. A long but interesting read.\nTools have stagnated I don’t think this is a big problem because browsers — the primary entrance to internet — have traditionally been terrible. Internet Explorer was slowww, Firefox was geeky, Chrome was heavy on memory and CPU, and none of them had really thought from scratch what a gateway to internet should feel like.\nGoogle once tried to replace the website thumbnails on start-up page of Chrome to website icons: screenshot of YouTube.com to [YT] logo. As one can expect, the [YT] logo will be more clear due to small screen space the thumbnail/logo gets. The users loved it — icons were more direct than a website’s thumbnail.\nHowever, the number of searches went down almost immediately as people clicked on [YT] logo instead of typing YouTube in Chrome search bar. Google reversed the change as ad revenue is critical to its sustenance. This story showed to me how Google will never make a better tool that doesn’t rely on search as the starting point.\nAlternatives to tooling, I use Arc today as my primary browser. Arc feels a lot more humane and easy to use.\nAre.na for Exploration Are.na https://www.are.na/explore is a simple moodboard like site where you can make collections of anything — media (audio, video, PDF), text (written text or quotes from websites), links, really anything — and others can include them in their collection if they like. The use cases are so broad that there is an Arena channel called “How do you describe Are.na at a party?”\nThere is also an hour long designers’ talk on Are.na. I’ve used it to create collections of ideas that aren’t too long to be a blog post, and aren’t too personal to stay on my phone. Like there is one for House of Leaves book (one book that you must read), for thoughts around Vipassana, another around Kohinoor diamond, AI stuff, Coffee, cool Hindi/Hindustani/Urdu words, anti-capitalism sentiments, non-fictional news, India and more.\nI often find myself spending time on their explore page to find interesting content that isn’t asking for engagement. It doesn’t feel commercial. Its a small group of people creating something useful.\nLife would be miserable if we only spent time in commercial spaces, because not all value can be captured and supported in a commercial context. We all know this, so it is a pity how overfitted and commercialized the internet, our second home, has become.\n— Frank Chimero, The Good Room\nWhat About Future? Honestly, I don’t know. As I grow older, I will miss more and more of the tools that existed — curse of ageing. Due to whatever reasons, they will die out as everything is impermanent. I wonder what would exist later in the future.\n","permalink":"/web-surfing-isn-t-fun-anymore/","summary":"\u003cdiv class=\"float\"\u003e\n\u003cimg src=\"images/tweet.png\" alt=\"Like many blog posts, this one started with an idea conveyed in a Tweet. A Tweet where I vented how the internet has changed so much that the art of exploration — called surfing because riding on the waves of information — wasn’t as fun anymore.\" /\u003e\n\u003cdiv class=\"figcaption\"\u003eLike many blog posts, this one started with an idea conveyed in a Tweet. A Tweet where I vented how the internet has changed so much that the art of exploration — called surfing because riding on the waves of information — wasn’t as fun anymore.\u003c/div\u003e\n\u003c/div\u003e\n\u003cp\u003eThere are several reasons why I don’t enjoy it as much as the early days of internet but here are a few of my guesses:\u003c/p\u003e","title":"Web surfing isn't fun anymore 🏄"},{"content":"Terminal can be quite fun to play around with. On encouragement of my friend Pablo, I ventured around to discover some interesting ones.\nTo use any of them, you should fire up Terminal on your MacOS/Linux/Unix and install the right packages using apt-get on Linux or brew on MacOS.\nFirst, install Homebrew if you don\u0026rsquo;t have it already.\n/bin/bash -c \u0026#34;$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\u0026#34; Once you have that, install any given package using brew install command like below.\nbrew install cowsay Cowsay and Fortune cowsay makes a cow say whatever you want it to say. It is particularly fun to join it with fortune which will give you a random message.\nCode from \u0026ldquo;The Matrix\u0026rdquo; You can see the code like in the Matrix movies with the command cmatrix. Install the package with brew install cmatrix.\nText in Colour Using toilet you can display text in a variety of colours and font styles. You can find the full list of options here.\nTrain in Terminal You can watch a choo-choo train in Terminal with the command sl.\nTelehack: A Collection of Terminal Apps You can watch Star Wars and do some pretty interesting \u0026ldquo;time pass\u0026rdquo; things on Terminal. telehack.com is an online website that you can also call via your terminal as:\ntelnet telehack.com Full documentation of Telehack is available at https://telehack.com/telehack.html. Let me give you an overview of some of my favourite commands.\naquarium: An ASCII art animation of an aquarium/sea. 2048: A sliding tile puzzle game. ching: Consult \u0026ldquo;The Book of Changes\u0026rdquo; (I Ching). eliza: Converse with an AI psychotherapist. joke [search]: Show a random joke from the joke database. morse : Encode or decode Morse code. qr : Generate a QR code. rain: Animated raindrops display in ASCII. phoon: Show the phase of the moon right now. roll: Roll animated dice. typespeed: A fun game to test your typing speed. The best one in my opinion is watching Star Wars in your Terminal. I don\u0026rsquo;t have the patience to finish it but its amazing to know someone put the effort to make it work.\n","permalink":"/terminal/","summary":"\u003cp\u003eTerminal can be quite fun to play around with. On encouragement of my friend \u003ca href=\"https://pablorious.github.io/\"\u003ePablo\u003c/a\u003e, I ventured around to discover some interesting ones.\u003c/p\u003e\n\u003cp\u003eTo use any of them, you should fire up Terminal on your MacOS/Linux/Unix and install the right packages using \u003ccode\u003eapt-get\u003c/code\u003e on Linux or \u003ccode\u003ebrew\u003c/code\u003e on MacOS.\u003c/p\u003e\n\u003cp\u003eFirst, install \u003ca href=\"https://brew.sh/\"\u003eHomebrew\u003c/a\u003e if you don\u0026rsquo;t have it already.\u003c/p\u003e\n\u003cpre tabindex=\"0\"\u003e\u003ccode\u003e/bin/bash -c \u0026#34;$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\u0026#34;\n\u003c/code\u003e\u003c/pre\u003e\u003cp\u003eOnce you have that, install any given package using \u003ccode\u003ebrew install\u003c/code\u003e command like below.\u003c/p\u003e","title":"Have fun with Terminal on Mac/Linux"},{"content":"The following are my notes from the Book Range: Why Generalists Triumph in a Specialized World by David Epstein. In today\u0026rsquo;s hyper-specialized world, it might seem counterintuitive that broadening one\u0026rsquo;s experiences and delaying specialization could lead to greater success. However, David Epstein provides compelling evidence and stories to support the idea that being a generalist in a specialized world is not just advantageous; it\u0026rsquo;s crucial.\nHere are some of my key takeaways from the book:\nThe cult of the head start Early specialization is not the only path to success. It is not even the best path to success. Early specialization works for some but not for most. The most successful people are those who have a range of experiences and skills. They are generalists, not specialists.\nPolgar Sisters The Polgár sisters - Judit, Susan, and Sofia - are renowned as some of the strongest chess players in the world. Their father, László Polgár, was a psychologist who believed that \u0026ldquo;geniuses are made, not born\u0026rdquo; and that early training and specialization were key. Along with his wife, Klara, he decided to test his theory by raising his children to be chess prodigies. The experiment was a huge success. Susan Polgár was the first woman to earn the Grandmaster title in 1991, at age 22. Judit Polgár became the youngest person ever to achieve the Grandmaster title in 1991, at age 15, breaking Bobby Fischer\u0026rsquo;s previous record by a month. All three sisters have been ranked among the top players in the world at various times.\nTiger Woods: Child Prodigy Tiger Woods is often held up as an example of the benefits of early specialization. He started playing golf at the age of two and was a prodigy from the start. He won the Under-10 tournament at the age of four. He won his first major tournament at the age of 21 and went on to become one of the greatest golfers of all time.\nRoger Federer: Generalist Roger Federer is often held up as an example of the benefits of late specialization. He played a variety of sports as a child, including soccer, basketball, and badminton, before settling on tennis. He did not start playing tennis seriously until the age of 12. He won his first major tournament at the age of 21 and went on to become one of the greatest tennis players of all time. At the age of 30 when most tennis players have retired, he was still winning major tournaments.\nSuccess Depends on Learning Environment: Kind vs Wicked Learning Environments One explanation why some fields have child prodigies while others do not is that some fields have a \u0026ldquo;kind\u0026rdquo; learning environment while others have a \u0026ldquo;wicked\u0026rdquo; learning environment. In a kind learning environment, the rules are clear, the goals are stable, and the feedback is immediate and informative.\nIn a wicked learning environment, the rules are unclear, the goals are constantly changing, and the feedback is delayed or nonexistent. Fields with kind learning environments tend to produce child prodigies, while fields with wicked learning environments tend to produce late bloomers.\nRepository of Facts vs Learning: Tactical vs Strategic Knowledge Kind learning environments value tactical knowledge and repository of facts. Wicked learning environments value strategic knowledge and the ability to think critically and creatively. Kind learning environments are best suited to specialists, while wicked learning environments are best suited to generalists.\nSchool and Exams are Examples of Kind Learning Environments Chess, classical music, mathematics, and computer programming are examples of fields with kind learning environments. They have clear rules, stable goals, and immediate and informative feedback. They tend to produce child prodigies because years of training leads to development of huge tactical knowledge.\nOne study found that chess grandmasters were very good at remembering the positions of chess pieces, often with just seconds of seeing it. They were shown a chess game in progress and ask to recall, and many did it with perfect accuracy. However, when they were shown a random assortment of chess pieces on the board \u0026mdash; an arrangement that\u0026rsquo;d never occur in an actual game \u0026mdash; they were no better than an average human.\nThey didn\u0026rsquo;t have photographic memory but had learnt the \u0026ldquo;tricks of trade\u0026rdquo; with years of practice.\nExamples of Wicked Learning Environments Science, art, politics, and business are examples of fields with wicked learning environments. They have unclear rules, constantly changing goals, and delayed or nonexistent feedback. They tend to produce late bloomers.\nNobel prize winning scientists tend to have hobbies that they care about, often completely unrelated to their area of expertise. Other scientists? Usually not.\nIt helps to learn many subjects at once because our brain can draw connections between many topics together.\nImplications for Artificial Intelligence Machines are very good at achieving one task, but one task alone. However, as soon as the task changes, they struggle to perform well. A computer program was taught to play a strategy computer game \u0026mdash; only taught the rules, not the strategy. With many, many tries, the program figured out how to beat virtually every human online. Why? Because, it could factor the potential strategies following its own gameplay, while making its first gameplay, much faster and with higher accuracy than a human.\nHowever, as soon a similar tool was provided to the human players (which only showed potential next step of the computer after the human\u0026rsquo;s gameplay), the computer program lost all its advantage. A simple computer + human beat an advanced machine hitherto unchallenged.\nTo me, this shows how AI with humans will result in human progress so much better and faster than AI. That\u0026rsquo;s why AI-doomerism isn\u0026rsquo;t healthy.\nConclusion Like many non-fiction books, this was a little too long. I believe the book could\u0026rsquo;ve been much shorter, perhaps a phamphlet. However, the key idea that we should expand our knowledge base so that we can draw inspiration from multiple sectors was novel and good to hear. Feel free to skip chapters; I don\u0026rsquo;t think you\u0026rsquo;ll lose the overall message.\nFree Audiobooks I was listening to this book using Libby which provides free audiobooks, courtesy your local library. Audiobooks are great way to kill time productively and learn more. It is very likely that your local library might already provide you a connection for free. Just so you know, I\u0026rsquo;m a member of Knoxville Public Library but I wasn\u0026rsquo;t paid to write this.\n","permalink":"/range/","summary":"\u003cp\u003eThe following are my notes from the Book \u003ca href=\"https://www.goodreads.com/book/show/41795733-range\"\u003eRange: Why Generalists Triumph in a Specialized World by David Epstein\u003c/a\u003e. In today\u0026rsquo;s hyper-specialized world, it might seem counterintuitive that broadening one\u0026rsquo;s experiences and delaying specialization could lead to greater success. However, David Epstein provides compelling evidence and stories to support the idea that being a generalist in a specialized world is not just advantageous; it\u0026rsquo;s crucial.\u003c/p\u003e\n\u003cp\u003eHere are some of my key takeaways from the book:\u003c/p\u003e","title":"Book Notes | Range: Why Generalists Triumph in a Specialized World"},{"content":"Political System in India For a fully functioning democratic government, it is essential to have clear insights into political parties that are involved. In India, any citizen of India can stand for the elections and anyone can start a party. Thus, we have thousands of parties that contest local, state, and national elections.\nElection Commission of India is the constitutional body empowered by Article 324 of the Indian Constitution, and vested with the power to supervise, direct, control, and conduct all elections to national parliament, state legislatures, local municipal bodies, including the offices of President and Vice-president of India.\nThe President and Vice-president have nominal powers according to the Indian constitution. The president is the head of state, akin to British monarchy, and they are the final signees to make a bill the law of the land. Vice-president, on the other hand, has slightly more powers; they conduct the activities of Rajya Sabha (Upper House). Upper House doesn\u0026rsquo;t mean it is more powerful than the Lower House (Lok Sabha); in fact, I\u0026rsquo;d argue it has less powers, particularly regarding economic bills as they must be initiated and passed by Lok Sabha before reaching Rajya Sabha. Even in terms of strength, Lok Sabha has 545 seats whereas Rajya Sabha has 245 seats.\nThe election commission is the guardian for elections. Before every election, it releases Moral Code of Conduct for political parties, which guides how should all parties involved behave. During the election period, it has absolute control over all state machinery and can call to action even paramilitary help if necessary.\nAre politicians rich? Election commission has long sought to reduce the influence of money from elections. The commission appoints Indian Revenue Services (IRS) officers from the Income Tax Departments as Election Observers (Expenditure) for all elections, most notably in R.K. Nagar constituency in Chennai in 2017. There are strict limits on how much can political parties spend. Even the campaign period has been reduced from 21 days to 14 days, in order to cut down on monies spent by parties.\nThe commission also requires all candidates to submit an affidavit on their assets owned before the elections (when they sign up for elections), which is made public immediately. Any lies results in criminal prosecution. (BJP Rajya Sabha MP Sushil Kumar Modi presented the Parl Commission Report which calls for disqualification and stricter punishments.)\nThe affidavit data has resulted in some interesting insights like PM Narendra Modi owns no immovable properties like a house or a vehicle, but most of his money is in cash, certificate of deposits, and four gold rings. His net worth is around ₹ 2 crore (around $250,000), and has no pending criminal cases against him. Rahul Gandhi\u0026rsquo;s net worth is around ₹ 15 crore (around $1.8 million). Association for Democratic Reforms (ADR) has created an easy to use website collating information from everyone\u0026rsquo;s affidavit. I implore you to explore the details at https://myneta.info/. The present government has also been publishing ownership details of all ministers since 2013, data for which is available here.\nIf it is still not obvious to you, let me spell it out for you: most politicians in India are really rich. (Although the impact of having a higher net worth on winning elections isn\u0026rsquo;t clear. According to ADR, there\u0026rsquo;s no significant correlation between the two, indicating voters take a holistic view of the candidates. I suspect this conclusion.)\nWhat about political parties? Indian political parties may have several income sources, some of which they have to report, while others they don\u0026rsquo;t. For instance, they have to report all voluntary donations that are larger than ₹ 20,000, sale of assets, membership fees, interest income, etc. They don\u0026rsquo;t have to report contributions from meetings, and most importantly \u0026ldquo;Electoral Bonds\u0026rdquo;. We will come back to Electoral Bonds shortly.\nElection expenditure in India is also one of the highest in the world. For example, in 2019 General (Federal) Elections, an estimated ₹ 55,000 crore ($8.6 billion) were spent. That is higher than $6.5 billion that was spent in US Presidential Elections in 2020. So, a developing country spends more in elections than a developed country. LOL.\nTransparency in Electoral Funding: Electoral Bonds In 2017, the then Finance Minister Arun Jaitley introduced \u0026ldquo;The Finance Bill, 2017\u0026rdquo; in Lok Sabha, where it was classified as a \u0026ldquo;money bill\u0026rdquo;. Recall that Lok Sabha can introduce and pass economic bills, and they can\u0026rsquo;t not be approved by Rajya Sabha if they have been passed by Lok Sabha. Around that time, BJP (ruling party) didn\u0026rsquo;t have majority in Rajya Sabha and thus the bill became an act without ever being introduced in the upper house. The bill involved amendments to four key laws: the Representation of the People Act, the Income Tax Act, the Reserve Bank of India Act, and the Companies Act.\n\u0026ldquo;Corporate-Political Party Bond\u0026rdquo;, as Deccan Herald called them, would be financial instruments that anyone can buy and send to any political party of their choice. Once these bonds are brought and transferred, they would completely blind everyone to the knowledge of political funding, innocuously hiding the detail of WHO donated to WHO. A more insidious move, hidden in edits to the Companies Act, was to remove the 7.5% ceiling on the proportion of profits of a company that could be donated to a political party. Furthermore, the provision that required companies to disclose the name of political parties to whom they had donated, and how much, was completely scrapped! Adding salt to the injury, both parties would still get tax exemptions. Another LOL.\nThe two amendments when taken together could result in shell companies being created for the sole act of political funding. Now even mafia, gangs, and dubious foreign entities can easily fund Indian political parties. Is that what Jaitley meant by \u0026ldquo;Transparency in Electoral Funding\u0026rdquo; in his budget speech? Or what the government notification titled \u0026ldquo;The Scheme of Electoral Bonds to cleanse the system of political funding in the country\u0026rdquo;? It seems in the complete opposite direction to me!\nOpacity of Electoral Bonds Soon, ADR (the NGO behind https://myneta.info/), Common Cause (another NGO), and the Communist Party of India (Marxist) filed petitions with Supreme Court of India opposing the scheme. In 2019, Election Commission opposes the scheme, citing concerns about transparency in political finance. ECI also discloses that it had shared these concerns in a letter to the Union government dated May 16, 2017. ECI also shares concerns regarding how the scheme could prevent information regarding foreign funding from coming out, \u0026ldquo;which could lead to Indian policies being influenced by foreign companies.\u0026rdquo;\nSupreme Court, on ECI\u0026rsquo;s recommendation, asks political parties to submit all details of donations, donors, and bank account numbers to ECI in a sealed envelop. But the information was still behind bars for common citizens. Right to Information was the last resort. All the while, more and more political funding was coming from \u0026ldquo;unknown\u0026rdquo; sources (primarily electoral bonds).\nAll while the popularity of electoral bonds was higher than ever. In FY 2021-22, 97% of income of Trinamool Congress (West Bengal) was from unknown sources! Source.\nParty Total Income Unknown Sources % Unknown Sources Dravida Munnetra Kazhagam 318.745 crore 306.025 crore 96% Biju Janata Dal 307.288 crore 291.096 crore 95% Bharat Rashtra Samithi 218.112 crore 153.037 crore 70% YSR Congress Party 93.724 crore 60.0168 crore 64% Janata Dal (United) 86.555 crore 48.3617 crore 56% Samajwadi Party 61.011 crore 3.66 crore 6% Shiromani Akali Dal 25.414 crore 12.1987 crore 48% All India Anna Dravida Munnetra Kazhagam 25.263 crore nil 0% Maharashtra Navnirman Sena 6.7683 crore 5.0762 crore 75% Telugu Desam Party 6.028 crore 3.667 crore 61% Supreme Court Strikes Back! On February 15, 2024, the Supreme Court of India (SCI) declared the Electoral Bond Scheme and the provisions of Finance Act 2017 related to it, unconstitutional. It found the act in violation to Right to Information of citizens. Furthermore, it directed the State Bank of India (SBI, a government-owned bank that issues, sells, and cashes electoral bonds) was asked to share complete details on issuance and redemption with ECI. ECI would then put the details on its website for public knowledge by March 13, 2024 (two days ago from today).\nSince 2024 General Elections are due in a month, the details would be a major news. SBI asked for extension, which was denied by SCI, and the data was made available at https://eci.gov.in/. (The site is geo-fenced to Indian IP addresses, I think.)\nBut I managed to get the data, thanks to my Twitter circle.\nWe have the data, now. Yayy! But so what?\nBiggest Beneficiaries: Political Parties I wanted to identify who are the contributors and who are they contributing to. But once I had the data, I realized it was essentially worthless without the mapping between the donors and receivers!\nFor the starters, we can find which parties benefitted the most from the scheme. Even without looking at the data, I can say it would be the ruling party BJP as it is the largest party, incumbent, initiated the scheme, and generally friendly to corporate interests.\nBJP is the biggest beneficiary, while Mamta Banerjee\u0026rsquo;s Trinamool Congress (which gives me really bad vibes from their conduct in Bengal elections) is the second highest. Considering that 97% of their funding came from electoral bonds, I\u0026rsquo;d argue that their case is more important than BJP.\nBiggest Donors: Corporations The largest corporate buyers of electoral bonds can be known from the data, but little can be understood without knowing who they gifted their electoral bonds to!\nPerhaps the most striking thing about the list of donors is the names it does not include. The Adani Group, the giant conglomerate whose value has grown by almost 1,000 percent since Mr. Modi took power, appears nowhere. Mukesh Ambani, Asia\u0026rsquo;s richest man, does not either, although his Reliance Industries has a roundabout connection to the third-largest donor listed, Qwik Supply Chain. Reliance released a statement saying Qwik is not a subsidiary of any Reliance company.\nThe biggest purchaser is Future Gaming and Hotel Services, which snapped up ₹1,368 crore ($165 million) in bonds. That is many times greater than the profits it has reported in any year! The company\u0026rsquo;s owner, Santiago Martin, often styled as India\u0026rsquo;s \u0026ldquo;lottery king,\u0026rdquo; was under investigation for money laundering.\nThe second largest contributor is Megha Engineering and Infrastructures Limited (MEIL/Megha) which purchased bonds worth ₹ 996 crores ($120 million). The current owner, PV Krishna Reddy whose wealth increased by ₹ 24700 crores in a year, is among the richest 100 people in India. His house \u0026ldquo;Diamond House\u0026rdquo; literally looks like diamond.\nThe third largest contributor, Qwik Supply, whose registered office is at Navi Mumbai\u0026rsquo;s Dhirubhai Ambani Knowledge City (DAKC), brought bonds worth ₹ 410 crore. To put in perspective, the firm had a revenue of ₹ 500 crore in 2022-23. The company spent almost all of what it earned on funding elections.\nThe entire list will leave you gasping for more information. Nirmala Sitharaman, our finance minister, dismissed any allegation of quid pro quo, saying that there was nothing to establish a link between raids by investigative agencies and funding, and that any such charges were mere \u0026ldquo;assumptions\u0026rdquo;.\nWhat else? There\u0026rsquo;s not much more that we can explore here without relying on other information. But many questions remain:\nDid any company get quid pro quo benefits after its funding? How did Reliance/Adani support the governments? I\u0026rsquo;m very certain they supported BJP (and maybe also the opposition). But how? Why do they not feature in this dataset? Did they just donate cash (which wasn\u0026rsquo;t reported in electoral bonds but reported thorough \u0026ldquo;known\u0026rdquo; sources of funding)? Now that the program is scrapped, what happens to the uncashed funding amount? Will SBI also release the funding graph, beyond just the two tables? Will Supreme Court also force more data release like this? Will it also force the companies to declare their donations like they used to before 2017? I\u0026rsquo;ll sit back and relax while more people take a look at this dataset. In 4-5 hours of work, I learnt so much about election funding; those who have spent months researching this topic will come up with much better insights.\nI hope you learnt something about election funding in general. Here are a set of related bookmarks that may interest you.\nDatasets Thanks to Rishabh Anand, I got the access to the complete dataset as well as a crowdsourced version which is investigating corresponding raids by federal authorities, or winning significant government contracts.\nFor this analysis, I used the raw data which you can access here. https://github.com/harshvardhaniimi/personal-website/blob/fe7f928810a8e883a6507bd3c4c602a0d2c69854/content/blog/2024-03-15-who-s-funding-the-politics-in-india/Electoral%20Bonds%20Raw%20Data.xlsx For easier viewing, I\u0026rsquo;m also uploading the same raw data with two pivot tables in a Google Sheet. https://docs.google.com/spreadsheets/d/1XqqWKoVtQwrNxsKlSiXBFZnCOyoAERBvHreUZ7PB-9I/edit?usp=sharing Finally, there have been some collaborative efforts to crowdsource information. https://docs.google.com/spreadsheets/d/17iCn40APazZ_v0Ce17l51ZlYNf0tweCaywTjK6Pfef8/edit#gid=0 Let me know in comments what you think about it or send me an email to hello@harsh17.in :D\n","permalink":"/who-s-funding-the-politics-in-india/","summary":"An order from India\u0026rsquo;s Supreme Court forced the government-owned State Bank of India to disclose details of electoral bonds, which allow anonymous political donations. The data showed that many companies and individuals who donated over $1.7 billion to political parties through these bonds.","title":"Who's funding the politics in India?"},{"content":" The world has enough for everyone\u0026rsquo;s need, but not enough for everyone\u0026rsquo;s greed.\n\u0026mdash; Mahatma Gandhi\nEvery once in a while, it is crucial to review where we are and where we are headed. You might be in a habit of doing a yearly or a monthly review. Shouldn\u0026rsquo;t we do a century-long review? Would we deem the values we take for granted today still relevant in a world where we are knees dip in polycrisis?\nPerils of GDP as KPI It\u0026rsquo;s a five-minute bike ride to the train station. On brisk mornings like this, I wear gloves and pack a warm coffee for the commute. My work buddy Lucy gets on two stops down, always with a pair of scones, wheeling her bike next to mine in the locker downstairs before joining me in the sunny coach section. Half an hour later, we unload the bikes and race each other along the greenway to our office. Twice a week, this; twice a week, we co-work from a cafe in the suburbs. The rest of the week is ours to enjoy.\n\u0026mdash; Betsy Ruckman\nThe capitalist system, with its emphasis on growth and efficiency, often quantifies success in terms of Gross Domestic Product (GDP), which is the total value of goods and services produced over a specific time period within a country. Yet, as we scrutinize the Philips Curve\u0026mdash;an economic concept that suggests an inverse relationship between rates of unemployment and corresponding rates of inflation\u0026mdash;we must ask: Does our relentless pursuit of economic indicators, such as GDP and low unemployment, overshadow the quest for happiness and well-being?1\nI think this is a good time to clarify that I am not against economic growth. However, I think the discussion should be \u0026ldquo;till when\u0026rdquo; rather than \u0026ldquo;yes/no\u0026rdquo;. You can note in the graph above how the richest country have grown significantly. Maybe, it\u0026rsquo;s time for them to slow down and focus on metrics other than GDP. This graph also tells us how some countries need to grow really fast, especially those that have substantial poverty.\nSimon Kuznets, the economist who developed the first comprehensive set of measures of national income, stated in his report to Congress in 1937 about the limitations of GDP:\nThe valuable capacity of the human mind to simplify a complex situation in a compact characterization becomes dangerous when not controlled in terms of definitely stated criteria. With quantitative measurements especially, the definiteness of the result suggests, often misleadingly, a precision and simplicity in the outlines of the object measured.\n\u0026hellip; additional difficulties will be suggested to anyone who wants to penetrate below the surface of total figures and market values. Economic welfare cannot be adequately measured unless the personal distribution of income is known. And no income measurement undertakes to estimate the reverse side of income, that is, the intensity and unpleasantness of effort going into the earning of income.\nIn 1962, he further added:2\nDistinctions must be kept in mind between quantity and quality of growth, between costs and returns, and between the short and long run. Goals for more growth should specify more growth of what and for what.\nMy problem is that we don\u0026rsquo;t have a discussion on quantity and quality, short-run and long-run, and cost of \u0026ldquo;reversal\u0026rdquo; if at all possible. None of that is factored into the calculation of GDP.\nWhat counts as \u0026ldquo;work\u0026rdquo;? Take, for instance, the stark contrast in how Sweden and India approach unpaid labor. The following graph is from the beautiful Atlas of The Invisible.\nOn the left, we have Sweden \u0026mdash; the most balanced country where the average amount of paid and unpaid work for men and women are close: women do 125% more unpaid work. On the right, we have India \u0026mdash; the least balanced country where the average amount of paid and unpaid work for men and women are miles apart. Women do nearly six hours of unpaid work where as men do less than an hour.\nIn Sweden, policies and societal norms acknowledge and strive to compensate for this often-invisible work\u0026mdash;such as caregiving and household chores\u0026mdash;which is crucial for societal sustenance. Conversely, in India, traditional gender roles often dictate that this labor remains unrecognized in economic terms, despite its fundamental role in the fabric of daily life. This unpaid labor, although not accounted for in a country\u0026rsquo;s GDP, is essential for the functioning of society.3 4\nThis disparity invites a broader question: How do we value labor and life? Bhutan\u0026rsquo;s pioneering Gross National Happiness index serves as a reminder that there are alternatives to GDP that encompass the richness of human experiences.5 This index includes measurable factors like psychological well-being, health, education, and environmental quality, providing a more holistic snapshot of societal health.\nSuch metrics dare to quantify the qualitative, challenging the traditional capitalist metrics by including the welfare and happiness of the population. The documentary film \u0026ldquo;Agent of Happiness\u0026rdquo; follows Bhutanese bureaucrats who survey citizens about their level of happiness. Specifically, it follows one bureaucrat named Amber Kumar Gurung as he asks people 148 questions to assign them a happiness score from 0 to 10.\nBenefits of Slow Life: Peace However, beyond the metrics lie the lived realities of individuals. Michelle Huang\u0026rsquo;s reflections on her move to rural Japan offer a poignant narrative on value and values. She speaks of a deeper connection to consumption, where waste management becomes a personal responsibility, not an abstract service. This echoes a sentiment that perhaps true value lies in our consciousness of our impact on the world and the legacy we leave behind.\nHuang\u0026rsquo;s journey is a microcosm of a larger narrative that questions the hypnosis of societal competitiveness \u0026mdash; a crucial ingredient of capitalism. It\u0026rsquo;s an existential vertigo that forces us to confront our priorities and desires, to distinguish between what is authentically sought after and what is imposed upon us by societal expectations.\nBeyond Capitalism I\u0026rsquo;m definitely not the first person to think on these terms. I was motivated to write my thoughts down upon reading this simple piece by Claire Elise Thompson on Grist. European Parliament organised Beyond Growth conference in Brussels that was attended by thousands. Then there\u0026rsquo;s also that \u0026ldquo;Slow Down: The Degrowth Manifesto\u0026rdquo; by Kōhei Saitō that I intend to read soon.\nBut the lessons are clear to me:\nIn our global reassessment of value, we must consider the teachings from these varied experiences. From Sweden\u0026rsquo;s recognition of unpaid work to India\u0026rsquo;s cultural norms, and from Bhutan\u0026rsquo;s happiness index to the simple, yet profound, life in rural Japan, there are lessons to be learned. We must craft economic systems that not only measure but also honor the full spectrum of human activity and happiness.\nCapitalist metrics may have served us well in one era, but as we evolve, so too must our systems. It\u0026rsquo;s time to redefine what we value and find new ways to measure the true wealth of nations: the well-being of its people.\nView my Are.na channel on Economics et al. →\nStagflation in 1970s has effectively disproven the validity of Phillips Curve. In fact, this spurred a debate on the role of expectations in the economy. Raghuram Rajan, who has worn many hats including RBI Governor, Chief Economist at IMF, and now a professor of finance at Chicago Booth, correctly identified the role of expectations on inflation in India. While criticized at the time, he managed to bring inflation expectations to 4 ± 2% in India \u0026mdash; a significant feat for a developing country.\nNevertheless, most central banks including the Fed and RBI use this critically in deciding their monetary policy.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nSimon Kuznets. \u0026ldquo;How To Judge Quality\u0026rdquo;. The New Republic, 20 October 1962. PDF.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nData Source: OECD, according to the book. Though India is not an OECD country so I\u0026rsquo;m not sure where was non-OECD countries\u0026rsquo; data sourced from.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThis point is also reverberated in the recent news where women are avoiding pregnancy as childcare costs are astronomical and family support \u0026mdash; taken for granted for decades \u0026mdash; is non-existent in capitalist countries like USA.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nIt is kind of funny that Bhutan decided to use \u0026ldquo;Gross National Happiness\u0026rdquo; over GDP in 1970s but didn\u0026rsquo;t actually start measuring its happiness until 2008. Movie Rec on the topic: Agents of Happiness.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/against-capitalism/","summary":"In this post, I wonder about capitalism by questioning the adequacy of GDP as a measure of success, highlighting the neglect of unpaid labor, and exploring alternative indicators like Bhutan\u0026rsquo;s Gross National Happiness index.","title":"Against Capitalism"},{"content":"OpenAI\u0026rsquo;s GPT is a terrific idea and a huge improvement over vanilla language models.\nLanguage models as a technology has incredible power. However, its knowledge and memory are limited by the model training. If the model was trained with data up to 2023, it only \u0026ldquo;knows\u0026rdquo; what happened before that. All such information is captured in its long term memory, from which information is retrieved. Each key piece of knowledge is encoded within the weights of the neural network.1\nHowever, recalling from memory isn\u0026rsquo;t a great way to answer questions. We humans are the best example of this. I remember from my Developmental Economics class that India significantly reduced poverty because we started using actual economic measures like consumption and expenditure, instead of relying on surveys.\nSimilarly, language models hallucinate when asked to answer a specific question. More recently, OpenAI is preprompting our questions to avoid exact factual recall.2 OpenAI\u0026rsquo;s solution to this lack of knowledge is custom GPTs \u0026mdash; language models trained for a specific task.\nBard was clearly an innovator in integrating a wide range of services, thanks to Google\u0026rsquo;s virtual monopoly on searching and communicating useful information. It could easily connect to Gmail, Drive, YouTube, Maps, Docs, Flights and a host of Google services. However, the performance was subpar. It would hallucinate a lot and often forget that it had the capability to talk to other Google services. (It almost always remembered to do that when told.)3\nOpenAI realized that the key to unleashing potential is opening up. Just as the App Store and Play Store democratized the capabilities of phones and internet access, a marketplace for LLMs could foster innovation like nothing else. I\u0026rsquo;d say that\u0026rsquo;s exactly what\u0026rsquo;s happening.\nI often find myself using these specialized GPTs far more frequently than the base model. There seems to be one for most practical purposes, and they\u0026rsquo;re quite easy to search for too!\nHere are some of my favourites:\nChat with Video Pro: You can chat with YouTube videos directly. Ask specific question about the video. It pulls the entire video\u0026rsquo;s transcript and injects it into language model\u0026rsquo;s short term memory as context. Thus, chances of hallucination are significantly less.\nGoogle says Bard can also talk to YouTube. In my experience, it hallucinates too often and forgets half the time that it can do that. U.S. Immigration Assistant: The situation as an international student/worker in USA is not great; too much depends on your visa status and lady luck. For answering my most random questions, I use this. It knows about the subject more than the base model. (But is not the same as an actual lawyer.) MixerBox Calendat: It is a Google Calendar AI assistant. You can GPT what you want, and it can tell what\u0026rsquo;s on your calendar and schedule meetings with ability to customize every detail possible. There is one for almost every use conceivable.\nCreating one is ridiculously easy. You tell ChatGPT what you want to create, supply the extra source of information, and voila! In fact, Ava from PerfectPlaces.Cool \u0026mdash; a project Dea and I have been procrastinating long enough \u0026mdash; was hard to implement natively.4\nAva has now been transformed into a sophisticated chatbot, capable of guiding visitors through a curated map of unique, crowd-sourced spots. By leveraging the power of GPT, Ava provides personalized recommendations with an almost uncanny insight, making the discovery of \u0026ldquo;tiny perfect things\u0026rdquo; an adventure in itself.\nYou can try Ava here. She can plan your visits to a city like no one else. Her information is based on crowd-sourced information, not paid advertisement.\nMore commonly known as \u0026ldquo;parameters\u0026rdquo; of language model.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThe most stark difference is in how many fewer actual facts are there in ChatGPT\u0026rsquo;s answer to historical questions. In the original iteration, GPT-4 would attempt to get the right year, impacts, etc. More recently, its turning to a novoice summarizer.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nLike Hanuman forgetting his powers and remembering them again when Jambavan, son of Brahma, reminded him of them.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nAlthough we tried, and it\u0026rsquo;s not particularly bad: https://perfectplaces.streamlit.app/\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/openai-gpts/","summary":"OpenAI\u0026rsquo;s GPT represents a significant leap forward in language model technology, moving beyond traditional limitations by incorporating mechanisms like preprompting to reduce factual inaccuracies. The introduction of specialized GPTs for targeted tasks has enabled more accurate, context-specific interactions","title":"OpenAI's GPT is a terrific idea"},{"content":"One day, a disciple complained to the Buddha about his restless mind, 2500 years ago. Buddha told him a parable of a tree and the monkey. In the story, a monkey climbs a tree and starts eating its fruits. Soon, he thinks another tree might have better fruits, and he jumps to it. This cycle continues, leaving the monkey never truly satisfied.\nBuddha explained that the monkey is like our mind, constantly jumping from one thought or desire to another, in a restless search for satisfaction. A mind that hops from thought to thought, craving constant stimulation, rarely finding contentment.\nHow, then, do we find peace in this incessant activity? The answer lies in understanding the very fabric of our thoughts, for if you understand how the mind works, it will bow down to you.\nOur mind (not brain per se) has four parts:\nSangya Smriti Vedna Sanskar Four parts of mind according to Buddha.\nSangya The first one is Sangya. Sangya is the initial perception or recognition of an object. This is the part of the mind that identifies what an object is, based on the inputs from the sense objects. It functions on direct inputs from our sight, smell, sound, taste and touch.1\nImagine Sangya as the lens of a camera, capturing the world in snapshots. Just as a camera instantly recognizes and frames a scene, Sangya identifies and labels our sensory experiences, delineating a \u0026rsquo;tree\u0026rsquo; from a \u0026lsquo;car\u0026rsquo;.\nSmriti The second one is Smriti. Smriti is based on the memory of our past experiences. This faculty of mind recalls all previous instances similar or related to the current input from sense objects. It is through Smriti that we remember our past associations, reactions and biases to the objects.\nConsider Smriti as an extensive, meticulously organized library. Each book on the shelf represents a memory, a past experience. When a new sensory input arrives, Smriti swiftly flips through these volumes, retrieving relevant past associations and knowledge.\nVedna The third one is Vedna. Vedna is the feeling or sensation that arises in response to the perception and the memory. Vedna can be pleasant, unpleasant, or neutral. They don\u0026rsquo;t just happen in our \u0026ldquo;brain\u0026rdquo;, but also happen in our body.2\nVedna is like a compass, guiding us through the landscapes of pleasure, pain, and equanimity. It\u0026rsquo;s not just an internal gauge; it resonates through our body, like the warmth of the sun on our skin or the chill of a breeze.\nFor example, meeting someone we don\u0026rsquo;t like can result in unpleasant sensations. Seeing butter chicken with garlic naan might generate a pleasant sensation. Recall how your mouth waters when hearing about a delicious food.3\nSanskar The final one is Sanskar. These are mental formations developed in response to Vedna. If the feeling or sensation was pleasant, a craving (or an attachment) to those sensations arise. If the feeling or sensation was unpleasant, a aversion (or repulsion) arises. Over time, these Sanskars become deeply ingrained in our mind. Some of them are conscious. But most of them are subconscious or unconscious.\nThink of Sanskar as a sculptor, tirelessly shaping our character and habits. Each experience leaves its mark, carving out patterns of likes, dislikes, and deep-seated tendencies within our mind.\nThese cravings and aversions leave our mind wanting all the time. We remain full of desires and never satisfied. A true seeker would train his mind to remain equanimous to pleasant and unpleasant sensations, avoiding creating new habit patterns of the mind.\nHow the four parts work together? Sangya, our mind\u0026rsquo;s lens, captures the world in vivid detail, paving the way for Smriti to sift through the archives of our past, bringing forward memories and learned reactions. This recognition and recollection stirs Vedna, our inner compass, evoking a spectrum of positive, negative or neutral sensations that color our moment-to-moment experience.\nThese sensations, in turn, sculpt our Sanskar, the silent artist, etching habit patterns of behavior and reactions deep within us. Together, these four parts choreograph the dance of our thoughts and behaviors, each step influenced by the previous, shaping the next.4\nParallels with Modern Psychology The conceptual framework of Sangya, Smriti, Vedna, and Sanskar remarkably mirrors modern psychological constructs.\nSangya aligns with the concept of \u0026lsquo;perception\u0026rsquo; in cognitive psychology, where sensory information is identified and interpreted. Smriti resonates with the \u0026lsquo;memory systems,\u0026rsquo; particularly episodic and semantic memory, crucial for recalling past experiences and knowledge. Vedna reflects the \u0026lsquo;affective neuroscience\u0026rsquo; perspective, emphasizing the integral role of emotions in our cognitive processes. Finally, Sanskar echoes the principles of \u0026lsquo;behavioral conditioning\u0026rsquo; and \u0026lsquo;habit formation,\u0026rsquo; highlighting how repeated experiences shape our tendencies and actions. Being mindful as the way to command our mind Observing the mind is indeed the easiest way to control it. One can achieve that state through a variety of methods, but being mindful of your sensations (what you\u0026rsquo;re feeling exactly now) at various times a day is possibly the simplest way to start.\nA seeker can observe when they\u0026rsquo;re happy, how does the body define \u0026ldquo;happy\u0026rdquo;? Is it the lips wide stretched into a smile, eyes lighting up, perhaps a tingling sensation akin to \u0026lsquo;spidey senses\u0026rsquo;, hair standing on end? Or is it a warmth spreading through the chest, a lightness in the breath, a spring in the step?\nEach sensation is a note in the song of mind, a unique expression in the body. By tuning into these sensations, we learn the language of our emotions, gaining insight and control over the inner workings of our minds.\nWe are terrible at paying attention to our sensory inputs. We can hardly focus on one sense organ (watch/listen/smell/etc.) at a time. Meditation helps by increasing our power of our attention. Your mind will still just \u0026ldquo;do\u0026rdquo; one thing but can remain aware of everything else naturally.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nI think one key observation of Buddha was that mental actions are as powerful and important as physical actions. Before we physically do an action, we have mentally committed it and sometimes even envisioned its end.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nSensations are different from emotions. Meeting someone unpleasant may result in restlessness and anxiety for example \u0026mdash; those are the emotions. However, mind decides to feel restless or anxious based on physical sensations generated. Those sensations are sweaty palms or hastened breath. Observing such sensations objectively can help avoid blind emotional reaction and help our mind take the right decision.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nLaw of Karma: Each step is influenced by the previous, shaping the next.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/theory-of-mind/","summary":"Four parts of our mind according to Buddha","title":"Theory of Mind"},{"content":"My first day in Cairo, Egypt (specifically in the vibrant district of Zamalek): I witnessed a striking scene. A young woman, donning a hijab and lost in her music, skillfully rollerbladed 🛼 through the bustling streets. It was 11 pm, and she maneuvered amidst a lively mix of cars and motorcycles. This sight made me question my preconceptions of Egypt as a conservative country. My first shock.\nThe journey to Egypt marked my first step onto the African continent. The Giza Pyramids, celebrated as one of the seven wonders of the world, had always intrigued me, yet I knew little about Egypt\u0026rsquo;s ancient legacy or its contemporary culture. What was daily life like 4,000 years ago under pharaohs like Tutankhamun, Hatshepsut, and Cleopatra (yes, from Shakespear\u0026rsquo;s Antony and Cleopatra)?1\nTo bridge the gap between Egypt\u0026rsquo;s past and present, I immersed myself in documentaries on History TV18 and Netflix before embarking on this week-long escapade, reminiscent of an Indiana Jones adventure. Soon after, Meenal and I went for week long trip neither of us was fully prepared for.\nZamalek In Cairo, we stayed in the upscale Zamalek neighborhood, known for its elegant 19th-century apartments, lush streets, and an eclectic mix of cafes and boutiques. Zamalek, transformed significantly after the 1952 revolution, showcases a blend of Egyptian and foreign influences, making it a unique cultural hub.2\nMany foreigners who lived in Zamalek left during extended periods of political unrest. These exodus provided an opportunity for more Egyptians to move into the area. Furthermore, Nasser\u0026rsquo;s nationalisation policies during the 1950s and 1960s led to the confiscation of many estates, which were repurposed for state use, including being transformed into public schools.\nGiza Pyramids Egyptians believed that just like the sun rises in the east and sets in the west, life should exist on the right of river Nile and death (with afterlife) should exist on left of Nile. Thus, Giza and the pyramids are on the left of Nile. Cairo, Luxor, and most other cities are on the east of Nile.\nThe Giza Pyramids stand as a testament to the ancient Egyptians\u0026rsquo; belief in the afterlife. Anubis, the jackal-headed god, guided souls through twelve moral dilemmas, similar to Yaksha in Indian mythology. The heart of the Pharaoh was weighed against a feather to determine his worthiness for the afterlife. This ritual, overseen by Osiris, the god of the afterlife, is depicted vividly in ancient artworks.\nIn this Papyrus3 print, Anubis is leading the king through twelve questions (top) and then has his heart being weighed against a feather, after which Osiris is allowing him into afterlife.\nCamel rides are common in Giza pyramid complex. They are super fun. However, I will highly recommend you visit the Giza complex with a guide and your own car (that is, prebook your adventure). Egyptians are kind and good people, but that is not the time to test it.\nOur stay in Giza was at a cozy Airbnb where we woke up with direct view of the Pyramids. Giza is less developed than Cairo but our host, Samer, more than made up for it. If you\u0026rsquo;re planning a trip, I highly recommend his assistance.\nMuseum in Cairo The Egyptian Museum, with its vast collection of antiquities, holds treasures like King Tutankhamun\u0026rsquo;s artifacts. His reign, though unremarkable, gained fame due to the discovery of his well-preserved tomb by Howard Carter in November 1922.\nKing Tutankhamun, colloquially known as King Tut, was a pharaoh in the eighteenth dynasty of Ancient Egypt. He ascended the throne as a child at nine years of age and continued till age 19, when he died of unknown causes. Theories range from plague to military coup. While his kingship wasn\u0026rsquo;t particularly remarkable, he became a news after his tomb was discovered by Howard Carter, a British Egyptologist, in near intact condition.\nIn Ancient Egypt, animal mummification was as significant as that of humans, serving various purposes. The Egyptians mummified animals in four primary categories: beloved pets buried with their owners, mummies of beef ribs, steaks, joints of meat, ducks, and geese intended as eternal food sources, sacred animals born with sacred symbols revered in religious contexts, and votive offerings like cat mummies presented to the gods as messengers between people and god. In fact, most people brought mummies from a temple store and took it to the priests.4\nThe mummification process for animals was similar to that of humans, involving desiccation and preservation. The animals were eviscerated, with exceptions like the sacred Mnevis Bull, whose viscera were preserved in canopic jars. The bodies were then dried using natron, massaged with oils, and wrapped in linen for burial. This process, which took up to seventy days, varied slightly depending on the animal\u0026rsquo;s size.5\nThe National Museum of Egyptian Civilization, though smaller in scale, boasted a more curated and enriching collection. Its highlight was the chronological display of royal mummies, offering a fascinating glimpse into the reigns and final resting states of Egypt\u0026rsquo;s ancient rulers. It was a remarkable experience to witness the varied fates of these kings and queens: some bore the scars of war with fractured skulls, while others showcased distinct hairstyles that spoke of their unique identities.\nIntriguingly, a few mummies exhibited signs of diseases like plague or polio, evident in their disfigured limbs, providing a poignant reminder of the human vulnerability that even royalty could not escape. I don\u0026rsquo;t have pictures for this as cameras weren\u0026rsquo;t allowed to preserve the mummies from light exposure.6\nLuxor, the Capital of Ancient Egypt Luxor, once the heart of Ancient Egypt, was our next destination. We started with the Karnak Temple 🏛️, dedicated to Amun Re, the sun god. This architectural marvel, with its massive columns and intricate hieroglyphs, reveals the grandeur of ancient worship practices. The colours on these columns remains after thousands of years, which is remarkable.`\nThe Valley of the Kings and Luxor Temple, with their rich history, offered insights into royal burials and religious ceremonies. We also visited Hatshepsut\u0026rsquo;s temple, Egypt\u0026rsquo;s first female pharaoh.\nThe ascent of Hatshepsut, Egypt\u0026rsquo;s first female pharaoh, is a tale of intrigue and ambition. Born in 1507 BCE as the daughter of Thutmose I and his queen Ahmose, Hatshepsut\u0026rsquo;s path to power was unconventional. Following her father\u0026rsquo;s death, she married her half-brother, Thutmose II, at a young age. Their union was short-lived, as Thutmose II\u0026rsquo;s death led to the ascension of their underage son, Thutmose III, to the throne.\nInitially, Hatshepsut served as Thutmose III\u0026rsquo;s regent, but her aspirations went beyond a caretaker role. She claimed her divine right to rule, asserting herself as the daughter of the sun god Amun Re. This bold claim, accepted by the priests, enabled her to seize the pharaoh\u0026rsquo;s mantle.\nHatshepsut\u0026rsquo;s reign was marked by prosperity and monumental achievements. She expanded trade with Africa, enriching Egypt with exotic flora and fauna, including giraffes that adorned her temple garden. As a visionary architect, she not only constructed the majestic Hatshepsut Temple but also significantly enhanced the Luxor and Karnak Temples.\nHatshepsut Temple, made by carving limestone mountains.\nHer story takes a dramatic turn with Thutmose III, whom she had sent to a military academy, perhaps hoping he would never return. Contrary to her plans, he emerged as a skilled warrior. It\u0026rsquo;s speculated that he led a coup against Hatshepsut and, after seizing power, attempted to erase her from history. He replaced her images and inscriptions on monuments with his own, although some remnants of her legacy, like the inscription on a tower in her temple, survived his efforts to obliterate her memory.\nIn some cases, such as this tower, Thutmose III couldn\u0026rsquo;t obliterate her name because it was written \u0026ldquo;Daughter of Amun Re, Queen Hatshepsut, presents this to the glorious God Amun Re\u0026rdquo;. In such cases, he built additional features such as the stone barricade that covered her name at the bottom.\nWe also went for a hot air balloon ride 🎈 over Luxor. It provided a breathtaking perspective of these ancient sites, allowing us to appreciate their scale and beauty from the skies. It was enthralling!\nBack to Cairo As our Egyptian adventure neared its end in Cairo, we embarked on a street food tour, delving into a delightful array of local cuisines, beverages, and desserts. The culinary journey started with a traditional Egyptian breakfast, predominantly vegetarian and packed with nutrients. A typical morning meal might include Khoshuri, Egypt\u0026rsquo;s national dish, or Egyptian bread accompanied by a variety of sauces such as tahini and baba ghanoush.7\nEgyptian bakery on the street selling staple bread for breakfast and lunch.\nLunch in Egypt aligns with the return of the household\u0026rsquo;s breadwinner, typically served between 2 pm and 6 pm. Dinner, if separate from lunch, is often a late-night affair around 2-3 am. This nocturnal dining pattern is reflective of Egypt\u0026rsquo;s vibrant night culture. Due to the intense daytime heat and desert winds, life in Egypt pulses more vividly after dark. Most shops and markets come alive around noon, buzzing with activity until they wind down around 3 am.\nIn Egypt, a predominantly Muslim country, public consumption of alcohol is frowned upon. Our encounters with alcohol were limited to a quaint spot in Zamalek and a four-star hotel in Luxor. In stark contrast, smoking shisha (hookah) is a widespread and culturally ingrained practice, readily available in most cafes alongside coffee.\nAn unexpected and delightful discovery was the popularity of juice shops. These vibrant hangout spots, offering an astonishing variety of 20-30 types of fresh juices, provided a refreshing contrast to my experiences in the US. These juice shops aren\u0026rsquo;t just about quenching thirst; they serve as social hubs where locals gather in the evenings for lively conversations and \u0026lsquo;gupshup\u0026rsquo;.\nOur street food tour group outside a juice shop. You can guess the popularity by looking at the number of people chilling in the background.\nLast Day: Coptic Church and Al-Azhar Mosque On our last day in Egypt, we visited the Coptic Church and the Al-Azhar Mosque, two sites in Cairo\u0026rsquo;s rich religious tapestry. Even as a Hindu, I felt peace visiting both the sites. The Coptic Church is a testament to Egypt\u0026rsquo;s ancient Christian community. Its intricate architecture and revered artifacts reflect a blend of Egyptian and Greco-Roman influences.\nJust a short distance away, the Al-Azhar Mosque stood in all its grandeur. It is one of the oldest mosques in Cairo and a beacon of Islamic learning with its wide assembly areas. The mosque\u0026rsquo;s elegant minarets and detailed Islamic calligraphy were pretty cool.\nEgyptian Economy and Elections Just ten days before we visited, Egypt had national elections. The current president, Abdel Fattah El-Sisi, got re-elected. El-Sisi got into power after the millitary coup in 2014. When I asked about the elections, someone said \u0026ldquo;no comments\u0026rdquo;. I got curious and learnt that the government claims 90% votes went to El-Sisi amid 67% voter turnout. No one I talked to had voted.\nThe economy was struggling: people cared a lot about dollars than about Egyptian pounds. For every dollar, my cab driver gave me an exchange rate of $1 = 43 EGP, while the bank exchange rate is $1 = 30 EGP. With high inflation, people are hoarding dollars to protect themselves from incoming catastrophe. It is also not allowed to have to more than $300 for Egyptians, which makes it very hard for them to travel outside Egypt.\nEl-Sisi\u0026rsquo;s personal project \u0026mdash; a Dubai-like city called New Administrative Capital (NAC) situated 45 kms from Cairo \u0026mdash; got disproportionate funding from the government. NAC was designed to reduce congestion from Cairo, which is already one of the world\u0026rsquo;s most crowded cities. However, because of high rents, uncertain future, and lack of local economic options, the project hasn\u0026rsquo;t made significant progress.\nConcluding Remarks Reflecting on my trip to Egypt, I\u0026rsquo;m struck by the contrast between its ancient wonders and modern realities. From the streets of Cairo to the Giza Pyramids and Luxor\u0026rsquo;s temples, the journey was a deep dive into a rich culture and history.\nMuseums in Cairo brought ancient stories to life, while the street food tour offered a taste of everyday Egyptian life. The political and economic landscape, glimpsed through discussions about elections and the struggling economy, added a layer of complexity to my understanding of contemporary Egypt.\nLeaving Egypt, I felt enriched by the experience, having seen a country balancing its impressive heritage with the challenges of the present.\nP.S. Meenal\u0026rsquo;s Blog and Playlist of Egyptian Arabian Songs Meenal also wrote her experience in her newsletter. Her experience is less factual but more on our experience. Hope you enjoy it too! During our stay in Egypt, we feel in love with the Egyptian songs. Here\u0026rsquo;s a playlist of the ones we Shazamed! Cleopatra VII was the last active ruler of the Ptolemaic Kingdom of Egypt. Renowned for her intelligence, political acumen, and allure, she played a pivotal role in the Roman political battles of her time. Alexandria, a major city in Egypt, was the capital of the Ptolemaic Kingdom and Cleopatra\u0026rsquo;s seat of power.\nCleopatra\u0026rsquo;s relationship with Julius Caesar began when she sought his support against her brother and co-ruler, Ptolemy XIII. Their alliance quickly turned romantic and political. Caesar\u0026rsquo;s support helped Cleopatra regain the throne. Their liaison also produced a son, Ptolemy XV, popularly known as Caesarion, whom Cleopatra claimed was Caesar\u0026rsquo;s heir. After Caesar\u0026rsquo;s assassination in 44 BC, Cleopatra aligned with Mark Antony, leading to further entanglements in Roman politics, which eventually culminated in their defeat and her subsequent suicide. Cleopatra\u0026rsquo;s life and reign signified the end of the Ptolemaic dynasty and the beginning of Roman dominion in Egypt.\nI didn\u0026rsquo;t visit Alexandria, though. Some other time.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nZiad A Akl (Zamalek: Social history of an island, Arham Online): Zamalek was originally built in the 19th century as a place close to the King or Khedive or Wali\u0026rsquo;s headquarters in Abdeen Palace in Downtown Cairo, where those who worked for the palace could find cheap accommodation proximate to their occupations. At that time, the price of land in Zamalek was cheap compared to other regions in Cairo. Many of the Abdeen Palace service staff bought land in Zamalek. Most of them were from upper Egypt. It appears that Zamalek was an expatriate island from many years ago.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nPapyrus, a cornerstone of ancient Egyptian civilization, was first used around 3000 BCE for writing and making scrolls. Derived from the pith of the Cyperus papyrus plant, it was processed by cutting the stem into thin, flat strips, which were then laid in overlapping rows and hammered flat, producing a durable, flexible writing surface. This innovation facilitated record-keeping, religious texts, and literature.\nScribes used reed brushes or pens with carbon-based ink to write on papyrus. The scrolls, central to administrative, legal, and scholarly work, were often stored in libraries like the famous one at Alexandria. The use of papyrus in Egypt gradually declined with the introduction of cheaper parchment and paper but remains a symbol of the country\u0026rsquo;s rich cultural and intellectual heritage.\nThe modern word Paper is actually derived from Papyrus.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nCats, which were the most common votive offering, were typically \u0026ldquo;farmed\u0026rdquo; by temple-permitted locals who snapped their necks when cats grew old enough. Their bodies were then mummified and sold to pilgrims coming to the temple. There was a dark economy: scammers sometimes sold empty caskets to noobs.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThe mummification process has been studied in detail (https://www.si.edu/spotlight/ancient-egypt/mummies) and also been repeated.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nMummification process is not unique to Egypt \u0026mdash; it has been practiced deliberately and naturally in all continents. deliberate mummification was a feature of several ancient cultures in areas of America and Asia with very dry climates. The Spirit Cave mummies of Fallon, Nevada, in North America were accurately dated at more than 9,400 years old. Before this discovery, the oldest known deliberate mummy was a child, one of the Chinchorro mummies found in the Camarones Valley, Chile, which dates around 5050 BC. Currently, the oldest known naturally mummified human corpse is a severed head dated as 6,000 years old, found in 1936 AD at the site named Inca Cueva No. 4 in South America.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nBaba ghanoush is very similar to another Indian dish called Baigan bharta.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/egypt/","summary":"Exploring Egypt from Cairo\u0026rsquo;s Zamalek to Luxor\u0026rsquo;s temples, juxtaposing its rich history with contemporary life and political-economic challenges.","title":"Egypt: An Odyssey Through Time"},{"content":"Since its launch, ChatGPT has taken over the world by a storm. While many are afraid of losing their jobs (which likely will happen), many of us are thinking of using these tools to upgrade our existing work quality. In the last several months, I have discovered numerous tools like that; let\u0026rsquo;s take a look at some interesting ones that I frequently visit.\nPerplexity Perplexity.ai is a search tool like Google but with added GPT\u0026rsquo;s knowledge base. Things that I have found it best for:\nDeep search when I don\u0026rsquo;t remember the keywords to search on Google,\nI want to have specific knowledge about a subject that is beyond the knowledge cut-off of ChatGPT such as recent news and developments,\nCopilot mode: when I\u0026rsquo;m not sure what are the limitations of my thought process and want AI to act collaboratively \u0026mdash; discuss options with me and choose based on my preferences.\nTheir \u0026ldquo;Discover\u0026rdquo; tab is also a good place to find interesting news from around the internet. Perplexity also has many other LLMs that you can try in their Playground. Mistral-7b-instruct in particular was famous for having a very low safety check in prompts and responses.1\nPhind I first heard about Phind from Paul Graham on Twitter/X who shared how Phind beat GPT-4 at programming, and was 5x faster! Furthermore, it supported 16,000 tokens while GPT-4 only supported 4096 tokens. That\u0026rsquo;s not all \u0026mdash; it could even search the internet, which means it was amazing at answering documentation questions from just launched software.\nSituations when I find Phind most useful:\nCoding. It is generally better than GPT-4 and can even search the internet! This is necessary when working with packages that got a recent update.\nSince it has 4 times the context window of GPT-42, I can put in the entire app or function for its perusal to poke out holes in my project. When you\u0026rsquo;re debugging, sometimes it\u0026rsquo;s not clear what\u0026rsquo;s the source of error \u0026mdash; putting the complete project there with pair programmer helps.\nPair programmer: When you choose this, the chatbot is conversational and much better to use for debugging.\nIn my experience, Phind generally gives multiple methods to complete a given task unlike GPT-4. It not only helps me learn the alternatives to achieve the objective, I know the limitations of each of them.\nThe responses also have search results with them in case you\u0026rsquo;re interested in learning more.\nClaude When Claude.ai (with Claude 2) was launched this summer, I immediately realised how prone to hallucinations it was. Like other LLMs, it made up stuff confidently. But the rate at which it made up stuff was similar only to Google\u0026rsquo;s Bard.\nNevertheless, Claude\u0026rsquo;s strength is in its ability to handle really long contexts, up to 200,000 tokens. That is almost 12x other models. Furthermore, it can process PDF, TXT, CSV and DOCX.\nI use Claude mostly for:\nSummarize information from PDFs. This is great for finding out information from really long documents that I don\u0026rsquo;t have the will to read. Its performance in programming is abysmal and I wouldn\u0026rsquo;t trust its factual responses. Do verify.\nSide-note on hallucinations: GPT-4 makes least amount of hallucinations, only around 3%. That\u0026rsquo;s better than most humans, in my opinion. Claude makes around 8.5% hallucinations. Google\u0026rsquo;s Palm 2 has a 27% hallucination rate. See the leaderboard.\nTypeset / Scispace ChatPDF and its cousins took over the internet around February. Many indie developers created their versions to chat with PDF files but the quality for all of them weren\u0026rsquo;t as good as we had hoped. For my research area \u0026mdash; machine learning \u0026mdash; it had shallow knowledge, no depth. If I asked it a question around the paper, it would pick up the key words and generate a meaningful response with those words. Did it answer my question? Largely, no.\nTypeset.io is a big improvement.\nLiterature review: You can ask research questions directly and it will give a response with research papers as citations.\nRead with AI: It can explain math! You can upload any PDF paper and it will start a chat on the paper. You can highlight any piece of text or math, and it will explain it to you. Granted it\u0026rsquo;s not the same as reading the paper itself, but much better than ChatPDF.\nIt comes with a Chrome extension that provides Read with AI options on most journal\u0026rsquo;s website.\nIt doubles as a paper management tool. You can collect papers together in a collection, create Bibtex citations, and more. Google Scholar++\nWhy not Bard? Because it simply doesn\u0026rsquo;t make the cut. Bard (with Google\u0026rsquo;s Palm 2 Chat) has the highest hallucination rate (27%), cannot do math at all3, and even with access to all my data, keeps giving summaries of things not in my emails, details about events that don\u0026rsquo;t exist, and travel advice to places in Knoxville, Iowa instead of Tennessee. Its image recognition is funky.\nA Redditor summarised it well:\nBard acts as a tired old employee who´s about to retire sometime soon and doesn´t give a fudge about its work. Just wants to end its shift and go back home to have dinner-and-a-movie alone on its couch like every night, and will continue doing that after retirement.\u0026hellip; like it was born tired... applying the law of minimum effort to all its answers...\nWith Gemini released today morning, Google says it has been able to catch up to GPT-3.5. Let\u0026rsquo;s see. Haven\u0026rsquo;t tested it yet.\nChatGPT is still the King ChatGPT is still the best AI chatbot for general purpose use. I find myself using the Voice Chat functionality a lot \u0026mdash; to brainstorm ideas and listen to ideas. Its speech to text (Whisper) and text to speech are much better than ANYTHING I\u0026rsquo;ve seen in any tool. It has an amazing ability to understand images, better than everything I know.\nWhen it was launched, it gave explicit instructions on how to commit suicide for example. Now, it denies answering the question.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nOpenAI released GPT-4-turbo which has 16k context memory, so this point isn\u0026rsquo;t that big anymore.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nIt gives a list of 13 players when asked for best 11 cricket players. I also give it serious penalty for not including MS Dhoni or Sachin Tendulkar \u0026mdash; two best players. ChatGPT gives me 11 players and includes both of them.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/four-ai-chatbots-other-than-chatgpt/","summary":"General purpose AI Chatbots have taken over the world in recent years. Here I take a look at four chatbots that I use quite often in my daily life.","title":"Four AI Chatbots other than ChatGPT"},{"content":"\nI am trapped inside Apple\u0026rsquo;s beautiful garden. It is beautiful, but it is locked.\nWhy is it locked?\nI have an iPhone.\nAnd an Apple Watch.\nAnd Apple Card.\nAnd Airpods.\nAnd Homepods \u0026mdash; two of them actually \u0026mdash; which have 10/10 spatial audio.\nAnd an iPad \u0026mdash; with Apple Pencil \u0026mdash; that I hardly use.\nAnd, of course, my Macbook Air.\nNow, imagine I want to try an android phone.\nOops, the watch doesn\u0026rsquo;t work anymore.\nOops, my credit score just hit rock bottom due to closing a high value credit card.\nOops, the Airpods are classic bluetooth earbuds with critically battery.\nOops, the homepods are useless.\nFrom a $1000 expense to buy a new phone, now it is almost $3000 to replace phone, earbuds, and speakers.\nI still need to decide what to do with my existing devices.\nSell them? Donate them?\nApple will only give me credit for buying more from Apple.\nWhat about the iPad?\nYeah that\u0026rsquo;s pretty independent.\nI can totally move to a Surface Pro or Samsung Note.\nExcept then I\u0026rsquo;ve to decide what to do with $150 Apple Pencil and $250 keyboard, which will be useless without iPad.\nMy Mac seems to be the only one that can exist independently.\nWhy?\nProbably because its foundation were laid before Apple became an ecosystem company.\nBefore it locked me in its beautiful garden.\nI had come to take a stroll around the fountain of Macbook. Now, I\u0026rsquo;m stuck in the quicksand of Apple.\nWhich, I reiterate, is beautiful. But I\u0026rsquo;m not sure I want to be stuck here.\nI value my freedom.\n","permalink":"/apple-walled-garden/","summary":"A poem on how I\u0026rsquo;m trapped in Apple\u0026rsquo;s Walled Garden","title":"Trapped in Beautiful Garden"},{"content":"It gave me a tonne of happiness in talking to IPM students on \u0026ldquo;Why Research?\u0026rdquo; at IIM Indore last week!\nThanks a tonne to Prof Ajit Phadnis who invited us for a panel discussion in his \u0026ldquo;Research Methodology\u0026rdquo; course. I shared dais with Divyanshu Kukreti and Vanshika Chaudhary, learning from their research experiences and sharing mine.\nHope to see more IPMers thinking of research as a career option, a road less travelled in business schools.\nHere are the slides:\nWhy Research - Slides PDF\n","permalink":"/why-academia/","summary":"Panel discussion on academic research as a career choice at my alma mater, IIM Indore","title":"Why Academia?"},{"content":"Warren Hastings deserves a special mention in the history of Colonial India. While Robert Clive thought of India as nothing more than a treasure trove ripe for plunder to exhaustion, Hastings saw an ancient and venerable culture that deserved praise, respect, and most importantly attention of the west.\nI love India a little more than my own country. \u0026mdash; Warren Hastings\nHe became fluent in Urdu, Persian, Bengali, and a good working knowledge of Sanskrit though never really mastered it. He made the first translation of Bhagawat Geeta. He christened \u0026ldquo;Asiatic Society\u0026rdquo; with the help of Sir William Jones, who started a series of translations of Indian classics in Sanskrit to English.\nWhile the Brahmins refused to teach Hastings (and Jones and virtually every White guy) Sanskrit in the name of religion, Hastings found a doctor who taught him enough Sanskrit to start reading Ramayana, Mahabharata, and Kalidas\u0026rsquo; Abhigyan Shakuntalam. Jones\u0026rsquo; translation of Shakuntalam, which he called Sacontalá or The Fatal Ring, became widely popular in Europe. Johann Wolfgang von Goethe, a towering figure in German literature, published an epigram about Shakuntala in 1791, and in his Faust he adopted a theatrical convention from the prologue of Kālidāsa\u0026rsquo;s play.\nHastings also realized that the Indian laws were highly inadequate for the needs. Royal courts didn\u0026rsquo;t have enough resources or even uniform laws; cases were judged based on panch, who were local elders brought together as a jury. With help from Jones\u0026rsquo; team, he translated Manusmriti in English which became a leading document for Indian (specifically Hindu) personal laws. Even today, Indian laws for the Hindus on marriage, adoption, and taxes are derived from this book written between 200-150 BC.\nAsiatic Society also translated other crucial works: Hitopadesha \u0026ldquo;Beneficial Advices\u0026rdquo; contains tales of wisdom written in Sanskrit between 800-950 CE, Rigveda and other vedas written before 2000 years BC, Ain-i-Akbari which is Mughal Emperor Akbar\u0026rsquo;s biography, Jataka Tales which are tales from Buddha\u0026rsquo;s times written around 500 BC.\nHastings\u0026rsquo; love for India was in sharp contrast to Clive, as I mentioned before. Clive, who had garnered huge fan following in the East India Company and the British Parliament after his wins in Battle of Palassey, saw Hastings as an adversary whose approach \u0026ldquo;understanding\u0026rdquo; India needed to be checked.\nHastings, respectful of indigenous customs and traditions, was crudely labelled \u0026ldquo;Orientalist\u0026rdquo;, while others inspired by the Whig brand of westernising \u0026ldquo;devil\u0026rsquo;s land\u0026rdquo;. Hastings impeachment trials began soon after in 1787 and lasted till 1795, dubbed \u0026ldquo;probably the British Isles\u0026rsquo; most famous, certainly the longest, political trial\u0026rdquo;. He was caught in the whirlwind of political attacks from all directions: Whigs led by famous philosopher and \u0026ldquo;statesman\u0026rdquo; Edmund Burke, were a faction trying to embarrass the William Pitt\u0026rsquo;s government, all intensified by the climate of deluge created by Robert Clive.\nIn the cartoon by James Gillray, Warren Hastings is portrayed as the \u0026ldquo;Saviour of India,\u0026rdquo; while exaggerated figures resembling political adversaries Burke and Fox assault him, symbolizing the intense political conflict and allegations of corruption that marked his time as Governor-General of India.\nAt Hastings's impeachment, his treatment of the Oudh queen-widows (Begums) provoked some of the most emotive outbursts from Burke, always a champion of princesses in distress. His speeches were full of \u0026ldquo;love-passion\u0026rdquo; for the wronged Begums, and even seasoned MPs could not recollect weeping \u0026ldquo;so heartily and copiously on any public occasion\u0026rdquo;.\nHastings was accused of personally instigating physical torture of the imprisoned eunuchs and of starving the Begums into submission. The accusations against Hastings were unfounded. The Begums, showing no ill will, sent messages of support during his trial, which he later used as evidence to counter the impeachment charges.\nThomas Macaulay, who destroyed Indian education system in the name of \u0026ldquo;modernizing\u0026rdquo; it, was highly critical of Hastings. He accused him of becoming \u0026ldquo;beloved by both the subject many and by the dominant few\u0026rdquo; and \u0026ldquo;enjoyed among the natives a popularity\u0026hellip;such as no other governor has been able to attain.\u0026rdquo; How is this an accusation?\nIt\u0026rsquo;s important to note that my appreciation of Hastings isn\u0026rsquo;t an endorsement of the British rule in India. East India Company and British government destroyed India through all means possible \u0026mdash; but one has to accept that Hastings\u0026rsquo; policies were a refreshing. Someone did realize the ancient culture of India and its rightful place in the world.\nHe survived through the trials and finally received an standing ovation from the parliament. His final speech went like:\nIndians have been misrepresented as sunk in the grossest brutality and defiled with every abomination, thereby justifying British attempts to reform them, nay to 'coerce' them into goodness. It will be better to leave them as they are\u0026hellip;\nAmong the natives of India, there are men of as strong intellect, as sound integrity and honourable feelings, as any of this Kingdom. I regret that they are not sufficiently noticed, sufficiently employed nor respected\u0026hellip; Be it your Lordship's care\u0026hellip;to lessen this distance\u0026hellip;and by your example make it the fashion among our countrymen to treat them with courtesy and as participators in the same equal rights of society\u0026hellip;\nThe parliament got the wrong guy. It should have been Robert Clive.\n","permalink":"/warren-hastings/","summary":"East India Company\u0026rsquo;s first Governor General Warren Hastings held deep respect for the Indian culture and was widely respected by the company officials and the local Indians. In contrast to his predecessor Robert Clive who exported only jewels from India to Britain, Clive exported oriental wisdom in religious texts including Mahabharata, Ramayana, and Vedas.","title":"Warren Hastings, The First Governor General of India"},{"content":"When \u0026ldquo;Clive of India\u0026rdquo; gained importance in the East India Company \u0026mdash; and managed to survive more than two years in India where most Company officers died from diseases and change in climate, etc. \u0026mdash; he realized the importance of India in making Britannia, more specifically him, rich. To him, India was a little more than a treasure of riches, waiting to be \u0026ldquo;looted\u0026rdquo; and plundered to exhaustion.\nRobert, Lord Clive. 1764. Oil on canvas by Thomas Gainsborough. Source: National Army Museum, London.\nWith the Seven Years War (1756-63) ongoing, the British government didn\u0026rsquo;t want to lose to either the French or the Spanish. News reached from their intelligence of a possible attack by French navy in one of the colonies. While the British government assumed it was India, more specifically Bengal, the French actually attacked Canada. All these extra millitary resources and weaponary reached Robert Clive, a East India Company official\u0026rsquo;s hand.\nClive\u0026rsquo;s first mission was to support the British in the war by attacking the French post in Chandernagar. Nawab had tried to send help but the governor of Hooghly was bribed to remain inactive and prevent the Nawab\u0026rsquo;s reinforcements to Chandernagar.\nJagat Seth et al., a prominent Indian banker group lead by a man whose name literally translates to \u0026ldquo;World\u0026rsquo;s Banker\u0026rdquo;, realized what was happening and inspired Clive to attack the Nawab of Bangal, Siraj ud-Daulah. Jagat Seth\u0026rsquo;s methods were quite advanced for its time. In 16-17th century India, it wasn\u0026rsquo;t easy to transfer wealth \u0026mdash; gold, wheat, and even taxes \u0026mdash; from one city to another. While the Mughal Emperror Akbar had established certain protection along the route for traders and travellers en route between major cities, it was nowhere close the \u0026ldquo;safe\u0026rdquo;. There was enough risk that Jagat Seth\u0026rsquo;s plan worked.\nJagat Seth\u0026rsquo;s group had bankers in most major cities in India including Kolkata, New Delhi, Mathura, Allahabad, Agra among others. Instead of physically transfering the wealth, traders could carry a \u0026ldquo;Certificate of Deposit\u0026rdquo; from one city\u0026rsquo;s bankers which could be encashed in the second city. Usually charging around 15-20% commission on the gross amount, they made huge profits.\nJagat Seth promised Robert Clive that his group would not only fund the attack against the Nawab, but would also pay the company and Mr Clive a million pounds each. To give you a perspective, that is almost a billion dollars in today\u0026rsquo;s currency. Clive couldn\u0026rsquo;t possibly say no; it was too good an offer to let go. If he lost, he would still have enough for an early retirement; if he won, he would have enough for his coming three generations, and then some.\nWhat did Jagat Seth have against the Nawab?\nNo one particularly liked Siraj ud-Daulah. He enjoyed watching people drown by throwing them overboard. He lived a lavish life and created unpredictable changes to the laws of land, based on his whims. Jagat Seth didn\u0026rsquo;t like it \u0026mdash; it was bad for his business to the East India Company, which still needed to have reliable partners to trade in an unfamiliar territory.\nJagat Seth had high confidence in getting back his borrowings to the company than if made to the Nawab. They both understood the concepts of finance and contracts. This becomes a lot more obvious in a few decades when the Marathas and Company have almost the same millitary might but Company wins simply because they could keep encashing Bengal and easy loans from Jagat Seth.\nBritish had been losing the first few days, thanks to Nawab\u0026rsquo;s superior army with 50,000 soldiers, 40 cannons and 10 war elephants. As opposed to this, the British had only 30,000 men.\nOn June 20, 1757, when the British attacked Bengal, they lost badly to Nawab\u0026rsquo;s army. Siraj ud-Dhaula captured British soldiers in Fort Williams, Kolkata, and confined them in a small dungeon known as Kaala Paani or \u0026ldquo;black hole\u0026rdquo;. John Zephaniah Holwell, the chief magistrate, claimed that 146 captives were crammed into an 18 by 14 feet space, though modern estimates suggest about 65 were present. The cell had only a small window for light and air, leading to severe dehydration and crushing among the prisoners. By the next morning, when the cell was opened, Holwell reported that only 23 survived, describing the survivors in dire terms.\nBlack hole of Kolkata memorial in present day Kolkata. Atlas Obscura.\nSo how did Robert Clive win?\nJune is the month of Monsoon. Its the time of rain. As nature had it, it rained heavily for the next two days and the British covered their gunpowder and artiliery to save it from the rain while the Nawab\u0026rsquo;s army couldn\u0026rsquo;t. Mir Jafar, Siraj ud-Daulah\u0026rsquo;s Commander betrayed him too. As soon as Clive realised that they had crucially missed covering their artillery, he attacked early morning and defeated Nawab with almost zero resistance. Siraj ud-Daulah\u0026rsquo;s failure to bring tarpaulins rendered his cannons inoperable in the rain. Assuming the British faced the same issue, his general, Mir Madan, initiated a cavalry attack. However, the British cannons were still functional, and their counterattack fatally wounded Mir Madan and significantly impacted the Nawab\u0026rsquo;s army.\nProcess Engraving after Richard Caton Woodville, 1900. Source: National Army Museum, London.\nSoon after, Nawab was forced to sign Diwani \u0026mdash; the East India Company now had right to tax the state. It was the dawn of colonial rule in India.\nSiraj ud-Daulah met his terrible fate at the hands of Miran, the son of Mir Jafar. Mir Jafar was appointed new Nawab of Bengal but he functioned as a puppet ruler for the Company. He does realize his mistake a decade later when he takes up against the Company again in the Battle of Buxar in 1764, but the East India Company\u0026rsquo;s army defeat the combined forces of the Nawab of Bengal, the Nawab of Awadh, and the Mughal Emperor.\nRobert Clive and Mir Jafar after the Battle of Plassey, 1757 by Francis Hayman. Source: National Portrait Gallery, London.\nFollowing their victory, the treaty acknowledged Mir Jafar as the Nawab of Bengal, while the British gained territorial control within and around the Maratha Ditch, as well as Zamindari rights from Calcutta to the coast. Mir Jafar agreed to compensate for the British naval and army losses.\nThe Battle of Plassey was a significant triumph for the British, both politically and economically. Francis Hayman\u0026rsquo;s painting above vividly illustrates this victory, contrasting the robustly fluttering Great Union Flag with the defeated, disheveled green and white flag, symbolizing the contrasting destinies of the British and their opponents.\nThis painting also is the first thing you see if you visit Clive\u0026rsquo;s estate today. The event depicted in the painting never actually happened; the transfer of power was a private event that happened in Clive\u0026rsquo;s war tent. In fact, the painter Francis Hayman never even set foot in India.\n","permalink":"/battle-of-palassey/","summary":"How did the British East India company, relatively poor in resources, managed to topple the richest kingdom in the world? Was the company that good, or was it pure luck?","title":"Battle of Palassey and Clive of India"},{"content":" When a measure becomes a target, it ceases to be a good measure.\n\u0026mdash; Goodhart\u0026rsquo;s Law\nEvery waking moment, our brain is incessantly solving complex problems, making decisions that range from the trivial to the transformative. While we might not consciously label this process as \u0026lsquo;optimization\u0026rsquo;, that\u0026rsquo;s exactly what it is. We are optimizing for our \u0026lsquo;happiness\u0026rsquo; (either short-term or long-term.) Amidst these myriad decisions, the looming question remains: What are we optimizing for?\nTo unpack this, we first need to understand the tools that guide or obstruct our choices: nudges and sludges. I\u0026rsquo;ll argue that our optimizations don\u0026rsquo;t serve us well, particularly when happiness is the metric. The idea of nudges and sludges from Nudge: Improving Decisions About Health, Wealth, and Happiness by Dr Richard H Thaler.\nA nudge is an element or feature that subtly guides human behavior towards a particular action without limiting choices or significantly changing economic incentives. For example, placing healthy foods at eye level in a grocery store is a nudge to encourage healthier eating habits. Nudges can be powerful tools for encouraging positive behavior while preserving freedom of choice.\nOn the other hand, a sludge is essentially the opposite of a nudge; it\u0026rsquo;s a friction or barrier that makes it harder for people to accomplish something. Think about convoluted process when you have to \u0026ldquo;unsubscribe\u0026rdquo; to a service.1 They are usually put in place to dissuade people from taking certain actions.\nWhen examining if something is being indirectly optimized for, look for these nudges or sludges. They\u0026rsquo;ll often reveal the underlying objectives and biases of a system or piece of content. Let\u0026rsquo;s look at a few examples to understand how they shape our lives.\nI reside in Knoxville, specifically around the University of Tennessee campus in Fort Sanders. The locale, bursting with bars and places to score alcohol or weed, speaks volumes about what people optimize for \u0026mdash; mostly intoxication. Meanwhile, cafes and libraries \u0026mdash; places that inspire creativity and growth \u0026mdash; are few and between.\nOur environment nudges us towards wasting our youthful energy on fleeting pleasures rather than anything enriching. What\u0026rsquo;s the sludge here? The university library closes 8 pm on weekends, implicitly pushing students towards less scholarly activities.\nBut does it have to be this way? Universities and societies could change their \u0026rsquo;nudges\u0026rsquo; to encourage lifelong learning, community building, or well-being, thus optimizing for more than just short-term pleasures. Jane Jacobs, the famous urbanist, once said: if a public space exists, people will use it.2\nSimilarly, the food landscape is an optimized mess. Fast food chains here are ubiquitous, making it harder to opt for a healthy meal. This isn\u0026rsquo;t universal; in other countries like Italy and Turkey, communal eating is more the norm. It\u0026rsquo;s almost as if the U.S. prioritizes speed over substance, even in diet. Jhonny Harris did an interesting video on bread in the U.S. which exemplifies how long-lasting bread is the goal, not the nutrients it provides.\nEven the focus is mainly on calorie counts, often sidelining other vital nutrients. In India, the discussion on calories was limited to economics where it was used for defining poverty line and minimum wages, which is calculated based on an expectation of 2320 kilocalories a day.3 An old article in Times of India had also identified the direct limitations of such metrics.\nI have seen better alternatives. While I was visiting Massachusetts General Hospital, Boston, I saw their cafeteria had a different kind of rating: red for items you should eat once in a while, yellow for items that you may eat occasionally, and green for items that you can eat three times a day. These cover some additional distance when used with calories, but still misses the point: balanced diets will make us healthy. It doesn\u0026rsquo;t solve the core problem that finding healthy food is hard in the U.S. due to systematic reasons, as I had written about it previously.\nLet\u0026rsquo;s take media coverage, where the nudge-sludge dichotomy manifests subtly. Nudges and sludges appear here through various means: vocabulary used (activist or terrorist?), tone (casual or gloomy?), and hyperlinks. These also indicate publisher biases.4 Writers often hyperlink to specific aspect of a story, directing the reader\u0026rsquo;s focus and understanding. Consider this article from The Juggernaut on alleged assassination of a Sikh extremist in Canada.\nAs you can see, the article by The Juggernaut is loaded with hyperlinks to let the reader know of more details. However, if you read a little more closely, you would see that all hyperlinks are about a specific cause: what led to the creation of Khalistani movement. The part I\u0026rsquo;ve highlighted \u0026mdash; specifically about Operation Blue Star \u0026mdash; has absolutely no leads anywhere. What\u0026rsquo;s that? Why would the Indian armed forces \u0026ldquo;desecrate\u0026rdquo; a holy shrine? Doesn\u0026rsquo;t this point require more elaboration?\nWhen you Google Operation Blue Star, you\u0026rsquo;ll find that it was a military operation initiated by the Indian government in June 1984, aimed at flushing out Sikh militant Jarnail Singh Bhindranwale and his associates from the Golden Temple. What the article doesn\u0026rsquo;t mention is that the operation was a response to escalating demands for Sikh autonomy and Bhindranwale taking refuge in the Temple to evade arrest. The military\u0026rsquo;s heavy-handed tactics were necessitated by the militants\u0026rsquo; advanced armaments and their use of civilians as human shields.\nThe absence of such crucial information reveals a biased presentation, reinforcing my point about media outlets and their optimized way of telling stories. Just like the environment nudges our behavior, hyperlinking in articles nudges our understanding. Writers choose specific aspects to focus on, molding our perspective and often leading us to question, \u0026ldquo;Why aren\u0026rsquo;t people talking about this?\u0026rdquo;. This not only shows the media\u0026rsquo;s role in shaping narratives but also highlights how readers, as citizens, have their own ideas on what should be optimized for, muddying the waters further.\nAnother example: Pet food is often engineered for output parameters like solid poop rather than focusing on holistic health or taste for the animal.\nTake Bobi, the world\u0026rsquo;s oldest dog who lived to be 31 years and 165 days old. He didn\u0026rsquo;t live on specialized dog kibble but ate food from his owners and resided on a farm. This contrasts with the more typical approach of feeding pets nutrient-optimized kibble in apartment settings, which may not necessarily cater to the overall well-being of the animal but certainly makes waste management easier.\nThese cases illustrate Goodhart\u0026rsquo;s Law in action. Whether it\u0026rsquo;s media narratives or pet food, once a particular metric is targeted for optimization, it often loses its effectiveness as a well-rounded measure of success. In sum, we live in a world where optimization is often misguided, targeting metrics that are convenient rather than holistic. From the urban designs that nudge us toward short-term pleasures, to the media landscapes that shape our perceptions in selective ways, we\u0026rsquo;re constantly guided by forces that don\u0026rsquo;t necessarily have our well-being at heart.\nThe problem is that these optimizations are often narrowly focused, neglecting the broader picture. By being mindful of the nudges and sludges in our environment and questioning what exactly is being optimized for, we can aspire to create systems that are more aligned with our own goals and values. In doing so, we may avoid falling into the trap outlined by Goodhart\u0026rsquo;s Law, recognizing that no single metric can capture the complexity of human experience.\nIn the context of subscriptions, they are often called \u0026ldquo;dark patterns\u0026rdquo;.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThe exact way I had heard of it was \u0026ldquo;If you build a place for people to sit, they will come and sit\u0026rdquo;.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThe recommendation is set by the Indian Council of Medical Research (ICMR). According to the data presented in parliament by Smriti Irani, Minister of Women and Child Development, the average consumption is lower than this.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nSee this interesting interactive chart on media biases.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/what-are-you-optimizing-for/","summary":"In a world driven by optimization, individuals are guided by nudges promoting beneficial choices and sludges creating barriers. Using examples from urban design, food choices, and media, I explore how these influences often prioritize convenience over genuine well-being, challenging you to critically assess what\u0026rsquo;s truly being optimized for in your lives.","title":"What are you optimizing for?"},{"content":"In a recent talk, I discussed my PhD research which aims to leverage AI to solve two business problems - forecasting print demand at HP and optimizing guaranteed delivery advertisment at Alibaba.\nAccurate demand forecasts can increase product availability and profitability for manufacturers like HP. We built machine learning models that outperformed existing statistical and human consensus methods at forecasting monthly demand for HP printing products.\nMeanwhile, digital ad platforms like Alibaba need to predict available ad inventory and strategically allocate it to advertisers through guaranteed delivery contracts. Our neural network model for Alibaba\u0026rsquo;s ad inventory allocation also beat benchmarks on both offline and online metrics while preventing overselling.\nIn both cases, AI and deep learning techniques were able to uncover insights that lead to better supply chain and inventory optimization outcomes. I\u0026rsquo;m excited to continue developing these models and potentially work on other complex business prediction problems in the future.\nSincere thanks to Prof Charles for encouraging and helping through both the projects.\nSlides:\nBA Forum - Slides - PDF\n","permalink":"/data-sciences-in-real-world/","summary":"A quick rundown of my doctoral research presented at the BA Forum 2023","title":"Artificial Intelligence and Data Sciences in Real-world Business"},{"content":"\nAs of today, I have 3,455 songs in my liked songs in Spotify (kinda Trippy to say). Needless to say, I really like listening to songs. I have many playlists in my library but almost all playlists require my active attention to maintain.\nI have maintained my playlists religiously. Historically. Not recently. I have grown reliant on Spotify\u0026rsquo;s algorithm to suggest me songs. However, the reliance on algorithm has various pitfalls.\nFirst, it tends to \u0026ldquo;predict\u0026rdquo; what songs I\u0026rsquo;m likely to not hit skip at. I am not sure that\u0026rsquo;s a parameter I\u0026rsquo;d personally optimise for. I want to discover new songs. Discover Weekly has never satisfied me. They are way too weird for my taste. They\u0026rsquo;re sometimes Tamil songs by artists that I listen to \u0026ndash; I\u0026rsquo;d skip them without second thoughts, simply because it\u0026rsquo;s highly unlikely I\u0026rsquo;d enjoy it. (Though that\u0026rsquo;s not universally true.)\nSecond, once Spotify learns what songs I like, it will recommend me the same ones over and over. Its algorithm is much better than YouTube for music but still, lacking in scope for me, when I am trying to explore from my own library.\nThird, there are many songs that Spotify\u0026rsquo;s algorithm recommended me or I Shazamed but have\u0026rsquo;t listened to it at all since adding to my library. Spotify tends to play songs that are more recent in my library and have played them. Here\u0026rsquo;s Spotify\u0026rsquo;s nine year old article on how their Shuffle works.\nLong story short, I needed a way to handle it myself. I needed something that would:\nLook at my songs library and identify songs that I added to my library more than 90 days ago. Create a weighted sample from those songs. The longer the song has been in my library, the more likely it should be to be picked up. Randomly choose 100 songs from the weighted sample of all the old songs. Repeat the process every week so that I get 100 songs playlist every week. Automate all the above steps so that it all works automatically. I designed a complete system for myself. The system will look through my library, identify the songs that I liked long ago (more than 90 days ago), create a weighted sample of them based on how old they are from today, and then draw 100 songs from it.\nIn case you are interested, here is how I did it. The complete code is at Github. In this blogpost, I am going to describe the major challenges, how I overcame them, and how you can create a similar automation service with your Spotify for free.\nHow I Created My \u0026ldquo;Discovered\u0026rdquo; Weekly Playlist? Spotify\u0026rsquo;s API has several limitations on what can you get from it. The biggest one is that you cannot get more than 50 songs from any function call.\nTherefore, I query songs one after another in smaller batches. Here is the code for the same.\n# Batching through songs, one \u0026#34;Verse\u0026#34; at a time offset = 0 liked_songs = [] while True: batch = sp.current_user_saved_tracks(offset=offset) liked_songs += batch[\u0026#39;items\u0026#39;] if batch[\u0026#39;next\u0026#39;] is None: break offset += len(batch[\u0026#39;items\u0026#39;]) I was constantly debugging why was I only getting 50 songs at a time. Spotipy package on Python, much to my dismay, doesn\u0026rsquo;t give any warning or even mention this limitation in their documentation. This way, I was able to get all 3000+ songs from library in a single object.\nFiltering the songs that were older than 90 days wasn\u0026rsquo;t hard. All I needed to do was get the song\u0026rsquo;s added dates and calculate the age of songs in my library. The ninja-technique there was in handling the time-zones, as has been my previous experience with Spotify when I was trying to understand my listening better.\n🥂 Aging like fine wine, older songs get more weight Once I got the time when they were added, I need a method to weight older songs higher than newer songs. Like I said, my primary complaint with Spotify was that it played the same songs for me every (damn) time. Thus, as a statistician, I had to take a weighted sample. The below approach is not the most efficient but probably the simplest.\nweighted_songs = [] for song in liked_songs: added_date = datetime.datetime.strptime(song[\u0026#39;added_at\u0026#39;], \u0026#34;%Y-%m-%dT%H:%M:%SZ\u0026#34;) # Add UTC timezone information added_date = added_date.replace(tzinfo=datetime.timezone.utc) age_days = (datetime.datetime.now(datetime.timezone.utc) - added_date).days if age_days \u0026gt; 90: weighted_songs += [song[\u0026#39;track\u0026#39;][\u0026#39;id\u0026#39;]] * age_days Following to getting the full list of pretentiously-serendipitous songs, I need to select 100 songs from that list. Easy peeasy.\nrandom.shuffle(weighted_songs) selected_tracks = weighted_songs[:100] Once I have the songs, I need to push it to a new playlist on Spotify. Since I want the entire process to repeat every Monday, it doesn\u0026rsquo;t need to create a new playlist every week. Rather, if the playlist already exists, it should just replace it. When you delete a playlist in Spotify, it goes to \u0026ldquo;Trash Can\u0026rdquo; (sort of). I didn\u0026rsquo;t want to clutter my trash can either.\nIf a playlist by that name already exists, the following code will delete all songs from it and then add 100 songs found this week.\nuser_id = sp.me()[\u0026#39;id\u0026#39;] playlist_name = \u0026#39;random\u0026#39; playlists = sp.user_playlists(user_id) existing_playlist_id = None # Check if a playlist with the same name already exists for playlist in playlists[\u0026#39;items\u0026#39;]: if playlist[\u0026#39;name\u0026#39;] == playlist_name: existing_playlist_id = playlist[\u0026#39;id\u0026#39;] break # If it exists, clear the tracks if existing_playlist_id: sp.playlist_replace_items(existing_playlist_id, []) playlist_id = existing_playlist_id else: # If it doesn\u0026#39;t exist, create a new one playlist = sp.user_playlist_create(user_id, playlist_name, public=True, description=\u0026#34;This week\u0026#39;s random songs that I haven\u0026#39;t listened in a while\u0026#34;) playlist_id = playlist[\u0026#39;id\u0026#39;] # Add selected tracks to the playlist sp.playlist_add_items(playlist_id, selected_tracks) That\u0026rsquo;s it! Getting the above code to execute wasn\u0026rsquo;t very hard. Hardly took an hour from start to end. However, I realised the bottleneck.\nIf I only want to execute this process once. However, I wanted to automate it. It should do this every week, without me noticing it. This is where Github Actions comes into picture.\n🤖 Getting it all automated\u0026hellip; Getting Github Actions to work with Spotipy was a big automation ask. You see, when you connect with Spotify\u0026rsquo;s API with Spotipy, you have to use a browser. You provide your client_id and client_secret with redirect_uri (which needs to be http://localhost:8000 or something similar). Then, it opens a prompt saying \u0026ldquo;Do you authorize this app?\u0026rdquo; and you approve, then copy and paste the link in your VSCode tab. (Steps to get client_id and client_secret are near the end of this article.)\nFor the first time you run this, I think you need to do this. Once you approve the app, it creates a access token in your computer\u0026rsquo;s cache memory. However, this token is not permanent. There is a separate refresh token that will make sure other tokens (primarily access token) don\u0026rsquo;t get stale. This is interactive when it runs on your computer.\nWhen running the code in Github actions, you don\u0026rsquo;t have access to a persistent cache memory. There is no browser authentication. I was not the first one to run into this issue. But I was able to solve this, again thanks to ChatGPT.\nBut first, we need the tokens for the first time.\n🌓 Tokens for the first time You will need to get the access tokens for the first time. Run the following script that gets the tokens.\nredirect_uri = \u0026#34;http://localhost:8000/\u0026#34; auth_manager = SpotifyOAuth(client_id=client_id, client_secret=client_secret, redirect_uri=redirect_uri, scope=\u0026#34;playlist-modify-public user-library-read user-read-recently-played\u0026#34;) token_info = auth_manager.get_access_token(as_dict=True) token_info has the token you need. Keep it safe to put in Github\u0026rsquo;s repository secrets.\nFrom there on, I am able to create an OAuth2 object without the refresh token, thus no need for a browser authentication.1\n# Function to refresh access token def refresh_access_token(refresh_token): client_id = os.environ[\u0026#39;SPOTIPY_CLIENT_ID\u0026#39;] client_secret = os.environ[\u0026#39;SPOTIPY_CLIENT_SECRET\u0026#39;] payload = { \u0026#39;grant_type\u0026#39;: \u0026#39;refresh_token\u0026#39;, \u0026#39;refresh_token\u0026#39;: refresh_token, } auth_header = {\u0026#39;Authorization\u0026#39;: \u0026#39;Basic \u0026#39; + base64.b64encode((client_id + \u0026#39;:\u0026#39; + client_secret).encode()).decode()} response = requests.post(\u0026#39;https://accounts.spotify.com/api/token\u0026#39;, data=payload, headers=auth_header) return response.json().get(\u0026#39;access_token\u0026#39;) client_id = os.environ[\u0026#39;SPOTIPY_CLIENT_ID\u0026#39;] client_secret = os.environ[\u0026#39;SPOTIPY_CLIENT_SECRET\u0026#39;] refresh_token = os.environ[\u0026#39;SPOTIPY_REFRESH_TOKEN\u0026#39;] new_access_token = refresh_access_token(refresh_token) # Set up Spotipy sp = spotipy.Spotify(auth=new_access_token) Essentially, the function uses a different approach to get the access token by making a raw POST query instead of using Spotipy. This works like a charm!\nWith all these steps, I was able to get the complete thematic working! Now, every monday morning, I wake up to a new playlist of 100 songs that I haven\u0026rsquo;t listened in a while. What a wonderful way to start a week!\n🌟 You have created something awesome. How do I use this? Thanks, here are the steps. First, get your client ID and client secret from Spotify\u0026rsquo;s developers dashboard.\nHow to get Client ID and Client Secret? Here\u0026rsquo;s Spotify\u0026rsquo;s documentation on how to get Client ID and Secret. developer.spotify.com\nGot it. Next? Fork my repository on Github. Then, head over to Secrets and Variables, Actions and create these three repository secrets.\nYou should have the first two from your app and the last from the code snippet I asked you to execute earlier.\n🫶 Humma, humma\u0026hellip; Now you are done! Eat a 🎂 to celebrate or have a cup of coffee ☕. To test that everything is working, head over to \u0026ldquo;Actions\u0026rdquo; tab on your repo, select \u0026ldquo;Create Randomized Spotify Playlist\u0026rdquo; workflow and click run workflow.\nIt should take around 1-2 minutes and you should see a new playlist on your Spotify library! Pro-tip: pin it for easy access.\n🎧 My This Week\u0026rsquo;s Random Playlist Link to Spotify\nI\u0026rsquo;m kinda confused on how access token and refresh token play with each other, so I\u0026rsquo;d appreciate your explanation if you know better.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/spotify-randomizer/","summary":"A new playlist of random songs that I haven\u0026rsquo;t listened in a while generated automatically every Monday","title":"Spotify Randomizer"},{"content":"Wow, what a ride! My 15-month time at HP Inc. feels like a full-on sprint through a techno-maze. Machine learning forecasting models for over 18,000 products in HP Print? Bring it on! 📊\nMy team was full of experts in supply chain modelling, machine learning and more. I worked closely with Cara Curtland, Jerry Hwang, Kevin Kacmarynsky, Barrett Crane and more from the SPaM team. Enjoyed learning from Pedro, Frederic, Chuck and Shawn. ✨\nI was an eager intern ready to tune up models and push them to production. Last summer was all about the grind: understanding the codebase, optimizing the processes, and leveling up. This summer? Cranking up the machine learning parts, rolling it into production, and seeing it all play out. 🦾\nForecasting is like playing a game at different difficulty levels. There\u0026rsquo;s the Product Platform level (easy-peasy), the Base Product level (now we\u0026rsquo;re talking), and the SKU level (hardcore mode). My battleground? The Base Product level, turning it into SKU-level forecasting. In fact, we were the first to make SKU-level forecasts with ML for Print at HP. Achievement unlocked! 🎖️\nImproving the ML model wasn\u0026rsquo;t just about tinkering with the hyperparameter knobs. Encoding categorical variables, handpicking features \u0026ndash; all this geeky stuff boosted the accuracy. Imagine tweaking a custom rig: every little change gives more horsepower, especially when you\u0026rsquo;re on a monthly cadence with demand forecasting. 🚚\nData? Oh boy, it\u0026rsquo;s like a jungle out there. We used existing demand features, information about products and geographies, and a few engineered product life cycle variables. This summer, my work was to include channel partner inventory, sell-in and sell-through volume. I carved bikelanes through the jungle, built an ETL pipeline, and churned out something our ML pipeline could use. 🚵\nThe real boss-level challenge? Loading the forecast into our Integrated Business Planning (IBP) tool for use by all planners. We\u0026rsquo;re talking about 18,000 SKUs across 45 geographies! When it all clicked? Party time! 🌎\nOn the tool front, my arsenal grew with Python, Pandas, SQL, Terminal, and more. We used SQL pipelines for extracting and loading the data, LightGBM (tree-based model) for forecasting with various custom tune-ups for modelling, Jupyter Notebooks to keep all our steps documented, and finally MLFlow to track all our experiments. You don\u0026rsquo;t realise the importance of last step till you start playing with fifty experiments. 🐍\nMy Terminal command prowess improved too. From just cd .., now I can do a lot more than grep searches, and rm -rf .. (If you don\u0026rsquo;t know what you\u0026rsquo;re reading, DON\u0026rsquo;T write and press enter.) Also learnt about symbolic links which are seriously neat. ☠️\nHP\u0026rsquo;s culture is the secret sauce. People aren\u0026rsquo;t just colleagues, they\u0026rsquo;re allies in a shared quest. My favorite mantra from my mentor Cara? \u0026ldquo;Fail Fast. Celebrate errors and innovations equally.\u0026rdquo; That\u0026rsquo;s not just talk; it\u0026rsquo;s HP\u0026rsquo;s working principle. Dare to fail, dare to innovate. ❤️\nThursdays were our geek fests: \u0026ldquo;Coding is Cool\u0026rdquo; sessions, filled with Python wisdom, SQL tricks, and yes, even ChatGPT experiments. And my cat Kaya was the star of the show. 🐈 flicks tail\nAll in all? An epic journey. HP, you\u0026rsquo;ve been a masterclass in innovation. SPaM, you\u0026rsquo;ve been an awesome ally. I look forward to continuing to work closely with the team while making my research my primary focus for the year. 🫡\nThanks for the XP, the lessons, and the code commits. Here\u0026rsquo;s to the next git branch of my life! 🚀\nMe with Cara, my mentor, in front of the patent wall which celebrates the patent holders at HP\u0026rsquo;s Vancouver office. Some had over 500 patents!\nSome shots from office\u0026hellip; HP Vancouver Office, Summer 2023\nI also volunteered at HP\u0026rsquo;s booth during Portland Pride Parade 2023\n","permalink":"/hp-blog-2023/","summary":"During my 15-month internship at HP Inc., I dove into machine learning forecasting, tackling challenges from SKU-level predictions to data management. Collaborating with the SPaM team, utilizing innovative tools, and embracing HP\u0026rsquo;s culture of innovation and failure, I emerged with invaluable skills, insights, and memories.","title":"HP Internship: A Year and a Half in the Fast Lane"},{"content":"(This is a formal write-up of my internship work. Read this blog for a chirpy review of my internship!)\nDuring my 15-month internship at HP Inc., I was actively engaged in implementing and optimizing machine learning forecasting models for over 18,000 products within HP Print.\nWorking with the SPaM team in supply chain modeling and machine learning, my responsibilities ranged from:\nEnhancing existing processes and understanding the codebase in my early days. Advancing to more complex tasks such as leveling up machine learning components, integrating them into production, and pioneering SKU-level forecasting within the organization. Engaging in ML improvements like encoding categorical variables and feature engineering, which significantly boosted the model\u0026rsquo;s accuracy. Building an ETL pipeline to include channel partner inventory, sell-in, and sell-through volume, facilitating smoother data processing. Tools Used My work involved a multitude of tools and frameworks, vital to the execution of tasks, including:\nPython and Pandas: Primary language for modeling with LightGBM. SQL: Used in pipelines for data extraction and loading. Tableau: Dashboard for sharing results widely with all planners and forecasters. Terminal: Employed for various command-line operations. Learnt about symbolic links in particular, in addition to use of grep, find, move and more. LightGBM: A tree-based model used for forecasting. MLFlow: To track all our experiments, proving indispensable in managing numerous trials. Impacts My contributions led to notable impacts within the organization:\nThe development of the first-ever SKU-level forecast with ML for Print at HP. Improved accuracy and efficiency of the forecasting model by introducing innovative changes. Created new pathways for new data by building an ETL pipeline, allowing for more effective demand forecasting. Successfully loaded the forecast into the Integrated Business Planning (IBP) tool, aiding planners across 18,000 SKUs and 45 geographies. I embraced the culture of innovation at HP, adhering to principles such as \u0026ldquo;Fail Fast\u0026rdquo; and actively participating in \u0026ldquo;Coding is Cool\u0026rdquo; sessions. This experience has been an invaluable part of my professional growth, and I look forward to continuing my collaboration with the team while prioritizing my research in the coming year.\n","permalink":"/hp23/","summary":"Creating ML demand forecast for print products at HP Inc. using LightGBM and pushing it to production for wide adoption.","title":"ML Forecasting at HP Inc."},{"content":"Last week was quite busy for me. It was my first time attending and presenting at KDD. 29th ACM SIGKDD (Special Interest Group on Knowledge Discovery and Data Mining) is ACM (Association for Computing Machinery)\u0026rsquo;s influential conference on machine learning, AI and everything in between. It is one of the most popular conferences in the field of data mining in the world.\nI also presented my research work on end-to-end inventory prediction and optimzation for use in guaranteed delivery advertising field. The work was done in collaboration with Alibaba and it is currently in production on their website. You can learn more about my presentation at https://www.harsh17.in/kdd2023/.\nBelow are my notes on the talks I attended Full proceedings are at: https://kdd.org/kdd2023/wp-content/uploads/2023/08/toc.html\nKeynotes Jure Leskovec, SIGKDD Innovation Award Winner 2023 Jure Leskovec is a professor at Stanford and was the Chief Scientist at Pinterest. He was the winner of SIGKDD Innovation Award 2023.\nWebsite: https://cs.stanford.edu/people/jure/\nMemetracker is an online tool that tracks the most mentioned phrase by analyzing 900,000 news stories and blog posts per day. It is available at http://snap.stanford.edu/memetracker/index.html Currently, our work involves use of tabular data. However, more natural state of data is graphs. Graphs showcase the relationships between different datasets and can also use information about the neighbours, that tabluar datasets cannot. We need a \u0026ldquo;transformer\u0026rdquo; for a database: something that fundamental that transforms how we do all data analysis; just the way transformers changed deep learning Graph neural networks learn information from neighbours to obtain enhanced node representations PyG from his team is the most widely used graph NN package Ed Chi, Google (Keynote Day 1) Ed H. Chi is a Distinguished Scientist at Google, leading several machine learning research teams focusing on neural modeling, reinforcement learning, dialog modeling, reliable/robust machine learning, and recommendation systems in Google Brain team.\nWebsite: https://sites.google.com/view/edchi/\nLLMs have raised the expectations on what we expect from ML and AI models\n100 years ago, we couldn\u0026rsquo;t fly. Today, we are irritated if our flight is late by half an hour Chain of thought prompting results in better model outputs than base model outputs\nIn simple terms, it means give examples of what you want from the model\nAlso called few-shot learning\nhttps://arxiv.org/pdf/2201.11903.pdf\nSelf-consistency Decoding\nIn critical tasks, ensemble model outputs into one output\nAsk the same question several times, take the majority vote\nTask decomposition\nFor complex tasks, decompose into smaller tasks. Either ask the model to break it down before attempting to solve it, or break it down yourself\nInstruction tuning (prompt engineering) works better with more advanced models than simple models. In some small models, fine-tuning or better prompting results in no improvement at all\nEvaluating outputs is critical\nSimilar to how we had to deal with recommender system outputs Eric Horowitz, Microsoft (Day 2 Keynote) GPT performs better than most humans in medical licensing exams (almost perfect at 99.9%)\nMedical error is the third largest cause of death in the US, after heart diseases and cancer (BMJ)\nAI enables computation, which enables calculating the expected value of taking action or not taking an action\nMicrosoft Teams: to minimise audio errors coming into a group call, predict when a person in a group call likely going to speak P(Action | Information, AI-assistance) \u0026gt; P(Action | Info) or P(Action | AI-assistance)\nOptimise for copilot\nAreas where AI makes error; areas where humans make error\nCombo of both leads to a better world\nI asked the question: \u0026ldquo;What is a task that AI wouldn\u0026rsquo;t be able to do in five to ten years?\u0026rdquo;\nQuestion: Five years ago, if you would have asked me if AI could sketch, I would laugh. Two years ago, if you would\u0026rsquo;ve asked if I can have an interesting argument with an AI, I would have said no. Today, I use it for coding, sketching and a lot more. A lot of \u0026ldquo;creative skills\u0026rdquo; can be done by AI. In fact, it performs better than most humans on creativity tests. What\u0026rsquo;s something that AI wouldn\u0026rsquo;t be able to do in ten years? Exclude jobs that we don\u0026rsquo;t want it to do: SC judges, caretakers, etc.\u0026quot;\nAnswer: There will be new jobs that\u0026rsquo;ll get created due to AI. It is difficult to say which jobs, exactly. (He said more but that\u0026rsquo;s the gist of it.)\nLarge Language Models Day Jaime Teevan (Microsoft) http://teevan.org/about/index.htm\nRetrieval-based learning is private by design as only the relevant information is communicated via API to the LLM service provider\nRest of the document information is stored locally in a VectorDB of embeddings\nThese documents that have so far been isolated in corporate settings and accessible only to those \u0026ldquo;shared\u0026rdquo; parties can come together to be part of one database that all in the company can access\nLike Google Maps, this forms a collaborative knowledge \u0026ndash; one brain to feed it all, one source of truth, one access protocol (with several access levels)\nDenny Zhou (Google DeepMind) https://dennyzhou.github.io/\nChain of thought prompting works better than one-shot or few-shot prompting in larger models\nGiving specific examples of what you want as the output from the model is better than suggesting the kind of output you want\nIf you want it as a JSON file, say that\nIf you want it as a pd.DataFrame({\u0026hellip;}), say that\nIf you want it as a markdown table, say that\nBIG-Bench is Google\u0026rsquo;s evaluation tasks for LLMs\nhttps://github.com/google/BIG-bench\nOpenAI\u0026rsquo;s evals: https://github.com/openai/evals\nVedanuj Goswami https://vedanuj.github.io/\nEven with long training time and data, the models doesn\u0026rsquo;t show any sign of slowing down. More data and compute, keeps making these models better and better For fine-tuning the model (LLaMA 2), perform RLHF and Rejection Sampling\nWeekly cadence in model output checks: RLHF, comparison between human and LLM output\nIn adversarial prompts, prefix with safety words to reduce their impact\nIn system prompt, feed in critical system values (corporate values, etc.)\nJason Wei (OpenAI) https://www.jasonwei.net/\nScaling Laws Tooling and infrastructure matter as more collaborators get together to work together Next word prediction is plateauing in performance, but there are emergent abilities (more on that later) Emergent Abilities Defined as abilities that the model is not explicitly trained for but performs great 33% of all tasks are done better by larger models Smaller models are great for tasks such as summarisation and search Larger models are great for reasoning, solving problems and coding What task becomes emergent is an open research question \u0026mdash; without trying large models at a full array of QA-pairs, of course See Google\u0026rsquo;s BIG-Bench (creatively named \u0026ldquo;Beyond the Imitation Game\u0026rdquo;, largest QA dataset for evals) Benchmarks for QA quickly become outdated. LLMs can beat many creativity tests, turing tests, knowledge tests, reasoning tests, or any such tests that we set up as benchmarks. What is a good benchmark? Does it have to be constantly changing? One size doesn\u0026rsquo;t fit all. Some models are better at some tasks than others. Research should identify which task - which model. Reasoning via prompting Chain of thought (CoT) reasoning differentiates GPT from previous ML models https://arxiv.org/abs/2201.11903 CoT helps large models, hurts small models (i.e. helps GPT-4, hurts GPT-3.5, ambiguous with GPT-3.5) Black magic of ML: hyperparameters; black magic of LLMs: prompting (prompt engineering is thus important) Applied Data Science Track BERT4CTR: Using BERT for Predicting Click-through Rates https://dl.acm.org/doi/10.1145/3580305.3599780\nUse fusion algorithms to include the embeddings from LLM into the models\nNumBERT: a model to convert non-textual features to textual features\nResearch by Google et al. using BERT (https://arxiv.org/abs/2010.05345)\nThis paper is super interesting as they found converting numbers to statements like \u0026ldquo;This is heavy\u0026rdquo;, \u0026ldquo;this is large\u0026rdquo; was helpful in regression and classification tasks\nBERT4CTR takes the first step to convert these non-textual features to textual features and uses them into predicting CTR\nAll non-text tokens are converted to a single token (how?)\nUses \u0026ldquo;uni-attention\u0026rdquo; to create interactions between non-textual and textual features\nDimensionality reduction for embeddings\nMy observation from OpenAI\u0026rsquo;s embeddings was that they were so dense that reduction caused information loss. Maybe not in this case as same numerical information is represented in multiple variables which BERT notices and removes QUERT: Query Understanding in Travel Domain https://dl.acm.org/doi/10.1145/3580305.3599891\nUsing LLMs, understand the search query better to streamline the model for recommendation and search engines\nQuery has more than intent: it has geography, time, etc.\nPhrase permutation is real: \u0026ldquo;weather new york\u0026rdquo; and \u0026ldquo;new york weather now\u0026rdquo; are likely the same things. LLMs can streamline them into one\nhttps://github.com/hsaest/QUERT\nFrom Human Days to Machine Seconds, Iddo Drori https://arxiv.org/abs/2206.05442\nHe works at MIT/Columbia and was trying to create questions for MIT\u0026rsquo;s final exam using LLMs\nQuestions would be typical questions and then contain the response to that question from an LLM\nThe task for students was: check if LLMs answer is correct or wrong. If correct, explain why. If wrong, explain why and write the correct response.\nCan we teach LLMs to create questions for tests and find answers?\nUsing LLMs to evaluate responses generated by LLMs\nEvaluation of specific responses using meta-questions\nZero-shot, 1-shot, few-shot, chain-of-thought all lead to different levels of accuracy\nZero shot: base LLM\n1-shot: use most similar one question from history\nN-shot (few-shot): use most similar N questions from history\n","permalink":"/reflections-from-kdd-2023/","summary":"My notes on talks I attended (mostly on LLMs) at 29th ACM SIGKDD 2023 at Long Beach, CA","title":"Reflections from KDD 2023"},{"content":"Recently, our research on end to end inventory prediction and contract allocation model got accepted to KDD 2023 Conference. I presented our paper in Long Beach, CA between August 6-10, 2023. KDD (29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining) is the premier conference on data science.\nFollowing are the links to the paper, slides, poster and a two-minute overview of our research work. Scroll down for some pictures. If you have any questions, feel free to email!\nLinks Paper PDF\nSlides\nPoster\nVideo\nAbstract Guaranteed Delivery (GD) advertising plays an essential part in e-commerce marketing, where the ad publisher signs contracts with advertisers in advance by promising delivery of advertising impressions to fulfill targeting requirements for advertisers. Previous research on GD advertising mainly focused on online serving yet overlooked the importance of contract allocation at the GD selling stage.\nTraditional GD selling approaches consider impression inventory prediction and contract allocation as two separate stages.\nHowever, such a two-stage optimization often leads to inferior contract allocation performance. In this paper, our goal is to reduce this performance gap with a novel end-to-end approach. Specifically, we propose the Neural Lagrangian Selling (NLS) model to jointly predict the impression inventory and optimize the contract allocation of advertising impressions with a unified learning objective.\nTo this end, we first develop a differentiable Lagrangian layer to backpropagate the allocation problem through the neural network and allow direct optimization of the allocation regret. Then, for effective optimization with various allocation targets and constraints, we design a graph convolutional neural network to extract predictive features from the bipartite allocation graph. Extensive experiments show that our approach can improve GD selling performance compared with existing two-stage approaches.\nParticularly, our optimization layer can outperform the baseline solvers in both computational efficiency and solution quality.\nTo the best of our knowledge, this is the first study to apply the end-to-end prediction and optimization approach for industrial GD selling problems. Our work has implications for general prediction and allocation problems as well.\nPictures Citation Wuyang Mao, Chuanren Liu, Yundu Huang, Zhonglin Zu, M Harshvardhan, Liang Wang, and Bo Zheng. 2023. End-to-End Inventory Prediction and Contract Allocation for Guaranteed Delivery Advertising. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD \u0026lsquo;23), August 6\u0026ndash;10, 2023, Long Beach, CA, USA.\nI sincerely thank Prof ChuanRen Liu for his guidance on this project and the opportunity to contribute meaningfully.\n","permalink":"/kdd2023talk/","summary":"We proposed a novel end-to-end approach, the Neural Lagrangian Selling (NLS) model, to improve Guaranteed Delivery (GD) advertising by concurrently predicting ad impression inventory and optimizing contract allocation","title":"End-to-End Inventory Prediction and Contract Allocation for Guaranteed Delivery Advertising"},{"content":"Ever since I began chronicling my coffee journey, things have become intriguingly more flavorful. ☕\nThe roots of a coffee bean can dictate much of its taste profile. For instance, single-origin coffee, where beans are sourced solely from one region, usually outshines blends, which mix beans from multiple origins.\nThe environment in which coffee grows, specifically the soil and climate, further impacts its flavor. African coffee, primarily from Kenya 🇰🇪, Ethiopia 🇪🇹, and Rwanda 🇷🇼, offers a delightful fruity sweetness. Conversely, South American beans (primarily from Colombia 🇨🇴 and Brazil 🇧🇷) impart a nutty, caramel-like note. As for Asian varieties (from Indonesia 🇮🇩, Vietnam 🇻🇳, India 🇮🇳), they bear a subtle spice kick, which explains the scarcity of such origins in American cafes.\nMy favourite beans so far have been Blue Tokai\u0026rsquo;s Attikan Estate (grown in Biligiriranga Hills in Karnataka 🇮🇳), Atomic Coffee Roasters\u0026rsquo; Black Velvet (grown in Honduras \u0026amp; Guatemala), and Gimme! Coffee\u0026rsquo;s Eternal Flame.\nCoffee beans I\u0026rsquo;ve tried so far. Blue Tokai\u0026rsquo;s Attikan Estate (grown in Biligiriranga Hills in Karnataka 🇮🇳), Atomic Coffee Roasters\u0026rsquo; Black Velvet (grown in Honduras \u0026amp; Guatemala), and Gimme! Coffee\u0026rsquo;s Eternal Flame have been my favourite. Starbucks must\u0026rsquo;ve used additional flavours or syrups as their beans neither smelled good, nor tasted good. (Not all Starbucks are bad though.)\nThe roasting process also plays a crucial role in shaping coffee\u0026rsquo;s taste. Coffee seeds are inherently bitter, and roasting mellows this bitterness, unveiling the true flavors. Hence, light roasts are often more bitter than dark roasts, a fact contrary to common belief.\nGiven the profound influence of beans, their freshness significantly affects the overall experience. Freshly roasted beans boast a robust aroma and taste compared to month-old ones, like those you might encounter at Starbucks. Typically, high-quality beans retain their taste for two weeks, or up to four weeks when stored in an airtight container. While some people resort to refrigeration for longevity, I\u0026rsquo;ve not found substantial differences.\nThen comes the pivotal step that Baristas are celebrated for\u0026mdash;brewing. There exist at least ten diverse brewing methods.\nMy experiments have led me to the Moka pot, French Press, Drip coffee (pour over), and Aeropress. (The latter two didn\u0026rsquo;t make it to the meme.) As a Latte aficionado, Aeropress resonates best with me. Using the Moka pot has also been fun, although it stayed back at my mother\u0026rsquo;s place in India after my visit last December.\nHowever, none of these methods can produce a satisfactory espresso shot. For that, one either needs to invest in a dedicated espresso machine or visit a decent cafe (not Starbucks; their beans were are a deal-breaker).\nFinally, the brewing technique, as emphasized by James Hoffman, is paramount. I\u0026rsquo;m still honing this step, so stay tuned for more!\n","permalink":"/what-makes-a-good-coffee/","summary":"This piece explores the intricacies of coffee, from the influence of its origin and roasting process to the importance of freshness and brewing techniques, primarily from my experience.","title":"What makes a good coffee?"},{"content":"Recently, our research on end to end inventory prediction and contract allocation model got accepted to KDD 2023 Conference. I will be presenting the paper in Long Beach, CA between August 6-10, 2023. KDD (29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining) is the premier conference on data science. I am thrilled to share our research. The poster presentation and the talk went great. Scroll down for some pictures!\nLinks Paper PDF\nSlides\nPoster\nVideo\nAbstract Guaranteed Delivery (GD) advertising plays an essential part in e-commerce marketing, where the ad publisher signs contracts with advertisers in advance by promising delivery of advertising impressions to fulfill targeting requirements for advertisers. Previous research on GD advertising mainly focused on online serving yet overlooked the importance of contract allocation at the GD selling stage.\nTraditional GD selling approaches consider impression inventory prediction and contract allocation as two separate stages.\nHowever, such a two-stage optimization often leads to inferior contract allocation performance. In this paper, our goal is to reduce this performance gap with a novel end-to-end approach. Specifically, we propose the Neural Lagrangian Selling (NLS) model to jointly predict the impression inventory and optimize the contract allocation of advertising impressions with a unified learning objective.\nTo this end, we first develop a differentiable Lagrangian layer to backpropagate the allocation problem through the neural network and allow direct optimization of the allocation regret. Then, for effective optimization with various allocation targets and constraints, we design a graph convolutional neural network to extract predictive features from the bipartite allocation graph. Extensive experiments show that our approach can improve GD selling performance compared with existing two-stage approaches.\nParticularly, our optimization layer can outperform the baseline solvers in both computational efficiency and solution quality.\nTo the best of our knowledge, this is the first study to apply the end-to-end prediction and optimization approach for industrial GD selling problems. Our work has implications for general prediction and allocation problems as well.\nPictures Citation Wuyang Mao, Chuanren Liu, Yundu Huang, Zhonglin Zu, M Harshvardhan, Liang Wang, and Bo Zheng. 2023. End-to-End Inventory Prediction and Contract Allocation for Guaranteed Delivery Advertising. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD \u0026lsquo;23), August 6\u0026ndash;10, 2023, Long Beach, CA, USA.\nI sincerely thank Prof ChuanRen Liu for his guidance on this project and the opportunity to contribute meaningfully.\n","permalink":"/kdd2023/","summary":"We proposed a novel end-to-end approach, the Neural Lagrangian Selling (NLS) model, to improve Guaranteed Delivery (GD) advertising by concurrently predicting ad impression inventory and optimizing contract allocation. The model incorporates a differentiable Lagrangian layer and a graph convolutional neural network to enable direct optimization of allocation regret and effective handling of various allocation targets and constraints. \u003ca href=\"https://www.harsh17.in/docs/kdd2023/E2E_Paper.pdf\"\u003e🔗 PDF\u003c/a\u003e","title":"End-to-End Inventory Prediction and Contract Allocation for Guaranteed Delivery Advertising"},{"content":"Sometimes, a journey far from home can unexpectedly transport you back to cherished memories. Such was my experience during my two-day sojourn in Seattle, the Emerald City. Seattle\u0026rsquo;s urban charm, combined with its vibrant market scene, immediately stirred recollections of bustling \u0026ldquo;Sabzi Bazaars\u0026rdquo; back home in India, igniting a sense of nostalgia and familiarity that added warmth to my adventure.\nBut, let\u0026rsquo;s start from the beginning.\nStarbucks Reserve The first day kicked off with a visit to Starbucks Reserve. This is no ordinary Starbucks outlet; it is an immersive and dramatic expression of coffee passion. The highlight of the experience was tasting their latest offering, Oleato \u0026mdash; a curious yet delightful combination of latte with olive oil. It was a warm, aromatic experience that truly woke up my senses, preparing me for the adventures ahead.\nStarbucks Reserve is a special roastery preparing \u0026ldquo;reserve\u0026rdquo; products: what Starbucks considers its rarest and best-quality coffees, usually single-origin coffees. There are six locations worldwide: Seattle, Shanghai, Milano, Toko, Chicago, and (newest one) New York. In addition to coffee, they also sell coffee-cocktails.\nStarbucks uses these big machines to roast and grind beans in store. I am not a big fan of Starbucks; they make coffee-flavoured drinks, not coffee. But I liked Starbucks. They had single-origin coffee, a well made espresso and good tasting latte. Here, the meticulous attention to detail and unwavering commitment to the art of coffee-making is palpable.\nStarbucks\u0026rsquo; ex-CEO Howard Schultz was visiting Sicily, Italy when he met several locals drinking olive oil as their morning ritual. He followed along, adding it to his morning coffee. He had an idea: why not mix them together? Soon after, Starbucks launched Oleato. Their penchant for innovation is another factor that sets them apart, continually pushing the boundaries of what\u0026rsquo;s possible in the realm of coffee.\nSeattle Great Wheel In the evening, a ride on the iconic Seattle Great Wheel offered a stunning perspective on the city. Perched right on the Pacific Ocean, the giant wheel offered a panorama that was an absolute feast for the eyes: the twinkling Seattle skyline in one direction, the vast expanse of the ocean shimmering in the other. As the wheel slowly ascended, the breathtaking view was gradually unveiled, making for a memorable experience.\nSpicy Ramen The day ended on a flavorful note at Wasabi Sushi \u0026amp; Izakaya. Here, I ordered a bowl of ramen, daringly requesting it to be \u0026ldquo;Thai spicy\u0026rdquo;. The waitress and the chef did an amazing job, creating a delicious, fiery bowl of ramen that tickled my taste buds with just the right level of spice.\nIt is seriously hard to get spicy, flavourful food in the US. Some places that manage to strike the balance, deserve a gold medal. 🏅 Like this Japanese place. The harmony of flavors and textures in the ramen was perfect. It was rich, spicy, and comforting \u0026ndash; just what I needed to end the day.\nMeeting Bob and Jenny I also met Bob and Jenny, friends of friend. Bob grew up in India and we had Thupka at The Everest Kitchen. Thupka is a Tibetean soupy noodles that\u0026rsquo;s super popular in India. I think I was having it after 4-5 years.\nChihuly Garden and Glass The next day was all about exploring Seattle\u0026rsquo;s cultural side. It started with a visit to the Chihuly Garden and Glass museum, a place where the art of Dale Chihuly comes alive in a spectacular fashion. The vibrantly colored glass sculptures, beautifully intertwined with nature, offered a visual spectacle unlike any other.\n","permalink":"/seattle/","summary":"Sometimes, a journey far from home can unexpectedly transport you back to cherished memories. Such was my experience during my two-day sojourn in Seattle, the Emerald City.","title":"Seattle: Echoes of Home in the Emerald City"},{"content":"\nIn the bustling realm of deep learning, classifying facial images accurately by age, gender, and race presents a unique set of challenges. Taking these challenges head-on, a novel project based on FairFace dataset with 108,501 images, was undertaken that aimed at creating a classification model with a two-stage method, utilizing just a fraction (15%) of the complete dataset. Using a Variational Autoencoder (VAE) to project images to a latent dimension and then training a Convolutional Neural Network (CNN) classifier on the VAE-generated encodings, the project attempted to streamline the process of image classification.\nVariational Autoencoders (VAEs) are central to the facial analysis model developed in this project. As a class of generative models, VAEs utilize deep learning to reduce the dimensionality of data, thus helping to project facial images into a latent space. Here, meaningful features are extracted and used to enhance the accuracy of image classification. The unique selling point of VAEs lies in their capability to learn and encode essential aspects of the data in a condensed and more manageable form. This approach is efficient, particularly when dealing with limited datasets, as it not only conserves computational resources but also optimizes the use of available data.\nBy training the autoencoder on just 15% of the dataset, the VAE successfully captures significant features, leading to improved performance in downstream classification tasks. An added advantage of VAEs is their inherent stochasticity, making them generative models. This allows the VAE in this project to generate new data, thereby augmenting data and enriching the diversity of facial image representations. Moreover, these latent space representations lend themselves to transfer learning, accelerating the training process and optimizing computational resources for other related tasks. This synthetic data generation capability of VAEs improves the performance of downstream tasks and aids in creating datasets for research, taking into account privacy concerns.\nAlthough the project did not yield the desired performance in terms of classification, it did pave the way for future improvements by leveraging larger datasets and increased training times. Obvious in the hindsight, the Random Forest model, used as benchmark, underperformed in comparison to the proposed model. This indicated that despite its limitations, the developed model surpassed the baseline and showed potential for use in other related tasks.\nIn essence, the project offered invaluable insights into the construction and application of autoencoder models in the world of deep learning. It demonstrated the power of transfer learning, paving the way for faster, more efficient models capable of extracting pre-trained features for a variety of tasks. The project stands as a testament to the ongoing journey towards achieving fairness in AI and ensuring enhanced facial recognition systems through learning richer representations.\nLink to Github: cosc-525/final-project at main · harshvardhaniimi/cosc-525 (github.com)\n","permalink":"/predicting-race-age-and-gender-from-face-a-small-sample-example-with-encoding/","summary":"Predicting race, age and gender of faces using Variational Autoencoders and Convolutional Neural Networks.","title":"Predicting Race, Age and Gender from Face: A Small-sample Example with Encoding"},{"content":"Welcome to the story of my 10-day nature escape, a journey from Maupin to Moab to Glenwood Springs that promises a delightful mix of adventure, discovery, and tranquillity. Join me as I traverse vibrant landscapes, make friends with Frank, and delve into the history of Moab along the way.\nI present to you not just the well-captured visuals of my experiences, but also the essence of my journey, as I revisit the star-studded nights, adrenaline-filled days, and the soulful quiet in between.\nDay 1 - First Steps and Pawprints Our journey kicked off with camping near Maupin, OR. The real star of the show, however, was Frank. Richard\u0026rsquo;s German shepherd, he\u0026rsquo;s an adorably confused guard dog who bravely barks at newcomers, only to cover them in affectionate licks moments later. Maybe he needs to revisit his guard dog training manual!\nOur camping spot near Maupin.\nDay 2 - An Intimate Introduction to Idaho We nestled into a somewhat nondescript camping spot, nestled 27 miles from Boise and just 10 miles from Idaho City. It wasn\u0026rsquo;t spectacular, but it offered a calm and peaceful night under the starlight nonetheless.\nDuring my journey, I crossed through a town named Madras, which made me think of its relation with Madras (Chennai), the south Indian city.\nMadras is the old name of Chennai, a south Indian city. Chennai was established as a fort town by the British to monitor and protect their navy. I was surprised to see Madras here in Oregon!\nDay 3 - Moab Magic Begins Arriving in Moab was like stepping into a scene from a landscape painting. After two days of camping, a short shower and cozy bed at Field Station felt great.\nRock formations at the Arches National Park. These are formed due to erosion, initially due to water and now due to air. The forces of erosion are sculpting more than just arches.\nBalanced Rock (second from right, height 128 ft/39 meters) clearly shows the various layers responsible for this amazing defiance of gravity.The caprock of the hard Slick Rock Member of the Entrada Sandstone is perched upon a pedestal of mudstone. This softer Dewey Bridge Member of the Carmel Formation weathers more quickly than the resistant rock above. Eventually, the faster-eroding Dewey Bridge will cause the collapse of Balanced Rock.\nOur car ride around Arches National Park was a spectacle best appreciated after 4pm, a tip for those keen to avoid long wait times (and entrance fees). When I later embarked on a bike ride, over 50 cars were queued for entry.\nDriving through the Arches is the best way to see it all. The inroads are about 30 miles and take you through viewpoints and trail heads. Next day when I was biking, I cycled around 10 miles before (getting tired and) deciding to return to other greenways.\nDay 4 - Biking Bliss in Moab Moab is filled with bike trails all around the Arches and Canyonlands National Parks. You can ride anytime as the parks are open 24 hours.\nA day dedicated to biking around Arches National Park and greenways offered a thrilling ride amidst breathtaking scenery. The evening was spent at the Sunset Grill, learning about Charles Steen\u0026rsquo;s history and how Moab emerged from the Uranium mining industry.\nCharles Steen, a determined yet struggling miner in the 1960s, embarked on a quest for uranium, lured by the US government\u0026rsquo;s $10,000 reward. In his last-ditch effort, digging seventy feet into the earth, he struck gold - or rather, uranium! His discovery dramatically transformed his family\u0026rsquo;s lifestyle, from washing laundry at the Colorado river bank to sending it via flight to Denver.\nSteen swiftly erected a luxurious $250,000 mansion in Moab, complete with a pool, greenhouse, and servants\u0026rsquo; quarters. His lavish lifestyle included a private plane and weekly dancing lessons in Salt Lake City. However, his generosity was legendary, hosting annual parties for all Moab residents and making substantial donations to a local hospital.\nHis monumental find put Moab on the map, dubbing it the \u0026ldquo;Uranium Capital of the World,\u0026rdquo; and creating plentiful jobs. He was even elected to the Utah State Senate in 1958. But fate had a twist in store. When the country no longer needed his ore, Steen suffered significant financial setbacks, declaring bankruptcy in 1968.\nSunset Grill, a family-owned restaurant which used to be home of Charles Steen \u0026mdash; godfather of Uranium mining in Moab.\nDespite losing his riches, Steen\u0026rsquo;s legacy lives on. His mansion, now the Sunset Grill restaurant, stands as a testament to his influence, overlooking Moab just as Steen once did. From rags to riches and back to rags, Steen\u0026rsquo;s story remains etched in the annals of Moab\u0026rsquo;s history.\nDay 5 - Brewing Stories and Adventures Say hi to Frank!\nFrom visiting the Moab Brewery to hiking in Corona Arch, the day was a mix of exhilarating exertion and soothing relaxation. Richard\u0026rsquo;s 4x4 off-roading took us halfway through Poison Spider Trail, offering strong adrenaline rush.\nCorona Arch Hike, Canyonlands National Park. The hike was short (3 miles) but had a steep climb on the hills. At one place, chains were provided to hold tight but not everyone needed them.\nHistoric Uranium mines. You can see the paths used by transportation trucks that are now used by OHV and ATV vehicles. (Canyonlands National Park)\nIsland in The Sky (such a cool name!), the north end of Canyonlands National Park, provides breath-taking panoramic views of the canyons.\nDay 6 - Glenwood Springs Reprieve After the previous day\u0026rsquo;s exploits, the camping at Glenwood Springs was an opportunity to unwind, accompanied by the incredible storytelling of Ted Chiang\u0026rsquo;s Exhalation. I almost finished the book long dwindling in my reading list, as Dea knows.\nTed Chiang\u0026rsquo;s Exhalation has an awesome set of short sci-fi stories. Some are truly astounding like The Merchant and The Alchemist, Exhalation, The Truth of Fact\u0026hellip;, and Omphalos. Ted Chiang is amazing at weaving stories.\nOne story had previously encouraged me to write the blog on memory: https://www.harsh17.in/infallible-memory/\nDay 7 - Hot Springs and High Stakes Rafting in the strong waves of cold water brought back life in me. By the way, cold showers are known to elevate mood. It is one of the only known activities that do not cause a dip in endrophins after use. Having a bad day? Take a cold shower!\nThe day was a cocktail of exhilarating white-water rafting and the serene Iron Mountain Hot Springs, creating a blend of adventure and tranquility unique to Glenwood Springs.\nMy home state, Jharkhand in India, also has a few hot springs. However, they\u0026rsquo;re so badly maintained and unclean that I hardly ever wanted to visit them. I\u0026rsquo;ve taken notes from the Iron Mountains, let\u0026rsquo;s see what becomes of it!\nDay 8 - The Thrill on the Hill Giant swing at the Glenwood Springs Amusement Park.\nWhat could be more exhilrating than an amusement park perched on the edge of a mountain? The rides, terrifying yet thrilling, boosted my adrenaline levels to all-time highs.\nDay 9 - A Slice of Idaho US is darn pretty. 🥺\nOur penultimate day found us camping near Boise amidst flatlands where cows offered their own unique soundtrack to the starlit night.\nIt was the clearest sky I\u0026rsquo;ve seen in my life. In India, I could maybe see a hundred stars at best. Now, I can literally see millions of them with my naked eye. It makes me appreciate the job of astronomers. It\u0026rsquo;s seriously like finding a needle in a haystack \u0026mdash; tracking their movements, even identifying them. (To identify them, I use SkyView Lite, a free app.)\nDay 10 - Wolf Creek Grand Finale Playing fetch with Frank. He didn\u0026rsquo;t give away this stick to me, ever. Maybe the game should be throw not fetch.\nWe saved the best for last - camping at Wolf Creek, arguably the most picturesque spot of our trip. With the ground full of daisies, clear water and sky. It was a fitting finale to an unforgettable journey that etched an indelible memory on our hearts.\nNature: A Never-ending Panorama of Vibrancy As I conclude this journey, a vibrant canvas of experiences, sights, and sounds unfurls behind me. From our first pawprints in Maupin to the grand finale at Wolf Creek, each day was a unique chapter in this unforgettable journey. The thrill of white-water rafting, the serenity of the hot springs, the adrenaline rush of off-road driving and the giant swing, and the tranquillity under the Idaho starlight and hot springs \u0026ndash; all these experiences have etched themselves into the core of my being.\nThis trip was more than just an escape into the wild; it was a passage into history and size of our vast, vibrant nature, specifically the United States countryside. As I pack up my memories along with my camping gear, I carry with me a renewed sense of wonder and a heart full of gratitude for this magnificent life.\nUntil the next adventure, keep exploring, stay curious, and never stop appreciating the beauty of the world we live in.\nMap of All Places Here\u0026rsquo;s an interactive map showcasing all the places we\u0026rsquo;ve visited. Please note, the exact routes aren\u0026rsquo;t accurate, but each location\u0026rsquo;s placement is correct. For precise directions, you might prefer using Google Maps or Gaia. This map was created using Felt.com, a user-friendly tool for making interactive maps. Click on the points to learn more about each location.\nHappy exploring!\nMay all beings, living or non-living, visible or invisible, be at peace.\n","permalink":"/moab/","summary":"From dry deserts to panoramic mountain vistas, I experienced the unique blend of adventure, history, and tranquility in this 10-day trip through vibrant canvas of the United States.","title":"My 10-Day Escape into Nature's Embrace: A Camping Journey to Moab"},{"content":"Hugo Apero is the Blogdown template I use for this website. The template ships with great defaults, the best of all Hugo templates in my opinion. Beyond those defaults, it provides many options to modify your website in a meaningful way. In this blog, I list out three good ones.\nFrom changing the color theme and adding a custom search bar, to customizing the fonts, each of these tweaks can significantly enrich your site\u0026rsquo;s aesthetic and functionality. So buckle up as we delve into the nuances of the Hugo Apero template, and let\u0026rsquo;s unlock the potential of your website!\n1. Change your Theme The colour theme for Hugo Apero can be changed by editing your config.toml or config.yaml file. The exact one would depend on your template version.\nIn your config.yaml file, you will find an option for theme. Edit it to suit your needs! There are six themes available by default.\n# use a built-in color theme # one of: forest / grayscale / peach / plum / # poppy / sky / violet / water theme = \u0026#34;violet\u0026#34; Michael McCarthy also created four new themes: Earth, Paper (Grayscale alternative which I like more), Magma (\u0026ldquo;dark\u0026rdquo; mode) and Primer (another dark one based on Github\u0026rsquo;s \u0026ldquo;primer\u0026rdquo; theme).1 As of today (April 16, 2023), I use \u0026ldquo;Earth\u0026rdquo;. If you want to use any of these, you will have to do some additional work.\nBefore I describe what you need to do to use them, take a look at these themes.\nI\u0026rsquo;ve found Magma and Primer (dark themes basically) to be bad at showing kable and DT tables. Just something to keep in mind.\nTo identify the changes needed to your site\u0026rsquo;s configuration, one can look at the pull request. Here are the steps:\nFirst, head over to assets/scaffold.scss and add the theme name you want to use to the following line. {{$themes := (slice \u0026#34;earth\u0026#34; \u0026#34;forest\u0026#34; \u0026#34;grayscale\u0026#34; \u0026#34;paper\u0026#34; \u0026#34;peach\u0026#34; \u0026#34;plum\u0026#34; \u0026#34;poppy\u0026#34; \u0026#34;sky\u0026#34; \u0026#34;violet\u0026#34; \u0026#34;water\u0026#34;)}} Then create a theme_name.scss file in assets/theme/ folder. You can look at the pull request for exact content of the files. (Earth and Paper, Magma, and Primer)\nGo to your config.yaml or config.toml file and change the theme name as described earlier. (Whether is \u0026ldquo;=\u0026rdquo; or \u0026ldquo;:\u0026rdquo; would depend on whether is yaml or toml.)\ntheme: \u0026#34;sky\u0026#34; 2. Add a Search Bar Another simple addition is to add a custom search bar in your site. Having a search tab on a website is useful because it allows users to quickly and easily find the information they are looking for. It also helps users who may not be familiar with the website\u0026rsquo;s organization or navigation, allowing them to quickly locate information without having to spend time browsing through different pages.\nIt is super simple to do in Hugo Apero.\nHead over to cse.google.com. Create a new search engine by clicking \u0026ldquo;Add\u0026rdquo;.\nGive your custom search engine a cool name. Add you website\u0026rsquo;s URL in \u0026ldquo;What to Search\u0026rdquo;. Would you like to include images for search and use safe-search? Choose appropriately.\nFollowing this, you will get a public URL for your custom search engine. It would be something like \u0026ldquo;https://cse.google.com/cse?cx=7d9698c281e7d2001\u0026rdquo;.\nI\u0026rsquo;ve added my search box to About page. To do that, you can head over to /content/about/_index.md. Then in the outro section, add the following HTML code.\noutro: | \u0026lt;i class=\u0026#34;fas fa-mug-hot pr2\u0026#34;\u0026gt;\u0026lt;/i\u0026gt;If my blog has helped you, you can [buy me a coffee](https://www.buymeacoffee.com/harsh17)! \u0026lt;script async src=\u0026#34;https://cse.google.com/your_search_url\u0026#34;\u0026gt;\u0026lt;/script\u0026gt; \u0026lt;div class=\u0026#34;gcse-search\u0026#34;\u0026gt;\u0026lt;/div\u0026gt; Make sure that show_outro is set to true. Enjoy your new search engine!\n3. Changing Fonts This subsection largely follows the official documentation. There are six fonts included out of the box. You can choose them in config.toml or config.yaml.\ntextFontFamily = \u0026#34;courier\u0026#34; headingFontFamily = \u0026#34;baskerville\u0026#34; You like it spicy? Me too. Let\u0026rsquo;s play with some nontraditional fonts from Google Fonts.2 We will be using this Webapp to get the files in the right format.\nFirst, decide which fonts you want for your heading and text. This is likely going to take a lot of trials but to understand the process, you can choose any from Google Fonts site. Once you know which font you want, search it\u0026rsquo;s name in the Webapp. The app will show you the custom CSS as well as prepare the ZIP file. Create a static/fonts folder and add the font files (that you got from Download files option) to that folder. Finally, head back to config.toml or config.yaml and edit the font name. customtextFontFamily = \u0026#34;\u0026#34; customheadingFontFamily = \u0026#34;Nanum Myeongjo\u0026#34; Push the commits to Github and enjoy your new fonts! If you\u0026rsquo;re not sure which fonts to choose, my advice is experiment, experiment and experiment. Jonathan Hoefler, who created fonts for Apple, Obama\u0026rsquo;s Change campaign, and more, did an episode on typeface design: \u0026ldquo;Abstract: The Art of Design\u0026rdquo; (Netflix).3 The linked site is an introduction to typeface fundamentals.\nAnd that\u0026rsquo;s a wrap! In this post, we have traversed a number of essential changes that can truly enhance your Hugo Apero site, ensuring it stands out. From switching up the theme and adding a custom search bar, to refining the fonts, these tips serve to unlock the potential that resides in the Blogdown template.\nExperimenting is key \u0026ndash; so go ahead and try out various themes, search bars, and fonts until you find the perfect fit. Remember, your website is a reflection of you, so don\u0026rsquo;t be afraid to show off your style!\nThis is a pull request on Apero\u0026rsquo;s Github repo. At some point, I believe it will be merged into the theme. But that hasn\u0026rsquo;t happened in almost a year\u0026hellip;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nI\u0026rsquo;m truly grateful to Google for making such a huge database of fonts freely available. Like seriously\u0026hellip;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nHat tip to Dea for the recommendation.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/hugo/","summary":"This post provides a detailed guide to customizing the Hugo Apero template, focusing on changing themes, adding a search bar, and altering fonts for a truly personalized blogging experience.","title":"Up Your Blogging Game: Three Enhancements to Hugo Apero"},{"content":"Recently, I ran into a situation where I need to compare two huge pandas data frames. When df has five million rows and 100 columns, all.equal function is not a good option. I had to do something creative.\nA straightforward solution was to break down the entire data frame into multiple smaller manageable chunks, and then do element-by-element comparison. With five million rows, this would take me around ten tries, depending on some other factors like runtime memory, etc. But, there must be a better method.\nBeginning as a brainstorming activity, I decided to ask ChatGPT for ideas. Rest of this blog is our conversations. Keep reading to learn about \u0026ldquo;hash collisons\u0026rdquo;, SHA-256 encryption that powers the internet of today.\nIf you\u0026rsquo;re interested my thoughts and not this Python chat, jump to [How does ChatGPT work?].\nhow to compare if two data frames are same in pandas? tell me an efficient way as my file sizes are huge Comparing two large DataFrames in pandas for equality can be computationally expensive. However, you can use a few techniques to make this comparison more efficient. Here\u0026rsquo;s an approach that you can follow:\nFirst, compare basic DataFrame properties like shape, columns, and dtypes. If they don\u0026rsquo;t match, the DataFrames are not the same. If the basic properties match, perform a hash-based comparison for faster results. Here\u0026rsquo;s a code snippet implementing this approach:\nimport pandas as pd import hashlib def hash_dataframe(df): return hashlib.sha256(pd.util.hash_pandas_object(df, index=True).values).hexdigest() def are_dataframes_equal(df1, df2): # Check shape, columns, and dtypes if df1.shape != df2.shape or (df1.columns != df2.columns).any() or (df1.dtypes != df2.dtypes).any(): return False # Compare hashes return hash_dataframe(df1) == hash_dataframe(df2) # Load your DataFrames df1 = pd.read_csv(\u0026#39;file1.csv\u0026#39;) df2 = pd.read_csv(\u0026#39;file2.csv\u0026#39;) # Check if DataFrames are the same result = are_dataframes_equal(df1, df2) print(\u0026#34;DataFrames are equal:\u0026#34;, result) This approach first checks basic properties and then compares the hashes of the DataFrames, which is more efficient than element-wise comparison, especially for large DataFrames. Note that this method assumes your DataFrames have a deterministic order. If the row order is not guaranteed, you may want to sort the DataFrames by a unique index before hashing.\nThis got me interested in hashing, so I asked.\nWhat is hashing? Hashing is a process that takes an input (in our case, the DataFrame) and produces a fixed-length string (usually a sequence of numbers and letters), which is the output called the \u0026ldquo;hash\u0026rdquo; or \u0026ldquo;digest\u0026rdquo;. A good hash function has the following properties:\nIt is deterministic, meaning that the same input will always produce the same hash. It is fast to compute the hash for any given input. It is infeasible to regenerate the original input data from the hash, making it a one-way function. A small change to the input data should produce such a drastic change in the hash that the new hash appears uncorrelated to the old hash. It is unlikely that two different inputs will produce the same hash (called a collision). In the context of comparing DataFrames, we use a hash function to create a unique representation of the entire DataFrame. By comparing the hashes of two DataFrames, we can quickly determine if they are the same or not. If the hashes are the same, it is very likely (though not 100% guaranteed) that the DataFrames are the same.\nIn the provided code snippet, we use the hashlib.sha256() hash function from the Python standard library, which implements the SHA-256 algorithm. This is a widely-used cryptographic hash function that produces a 256-bit (32-byte) hash.\nThe pd.util.hash_pandas_object() function from pandas is used to create a hashable representation of the DataFrame. It computes the hash for each row and returns a Series of hashes. We then hash this Series of hashes using SHA-256 to get a single hash that represents the entire DataFrame. This allows us to efficiently compare large DataFrames by just comparing their hashes.\nAlthough GPT doesn\u0026rsquo;t say this explicitly, SHA-256 also powers most encryption algorithms today, including protecting apps we love: Whatsapp, Email, Financial Transactions, and my digital garden. Whenever you head \u0026ldquo;256-bit encryption\u0026rdquo;, it\u0026rsquo;s probably this. I had no clue data frames could be converted to hashes so easily.\nAlso, \u0026ldquo;hash collison\u0026rdquo; sounds like a term straight from The Hitchhiker\u0026rsquo;s Guide to the Galaxy.1\nWhat is the probability that the data frames are not the same if they have the same hash? The probability of two different DataFrames having the same hash (a hash collision) is extremely low but not impossible. The probability depends on the hash function used and the size of its output (the number of bits in the hash). In the provided code snippet, we use the SHA-256 hash function, which produces a 256-bit (32-byte) hash.\nFor a well-designed hash function like SHA-256, the probability of a hash collision is roughly 1 in 2^(number of bits in the hash). In this case, since there are 256 bits in the hash, the probability of a collision is about 1 in 2^256, which is approximately 1.1579209 x 10^(-77).\nThis probability is incredibly small, so small that it is practically negligible in most use cases. To put this into perspective, it is far less likely than winning the lottery multiple times in a row or being struck by lightning multiple times.\nHowever, it\u0026rsquo;s essential to understand that while the probability is extremely low, it\u0026rsquo;s not zero. In applications where it is absolutely critical to detect every difference, you might want to perform an element-wise comparison as a final step after the hash comparison. But in most cases, hash comparison should be more than sufficient to determine if two DataFrames are the same.\nLet\u0026rsquo;s say I got false. How do I identify the difference between data frames? If the comparison returns False, indicating that the DataFrames are not the same, you can use the compare() function in pandas to identify the differences between the two DataFrames. The compare() function is available in pandas version 1.1.0 and later.\nHere\u0026rsquo;s an example of how to use the compare() function:\nimport pandas as pd # Load your DataFrames df1 = pd.read_csv(\u0026#39;file1.csv\u0026#39;) df2 = pd.read_csv(\u0026#39;file2.csv\u0026#39;) # Compare DataFrames and find differences df_diff = df1.compare(df2) # Save the differences to a CSV file df_diff.to_csv(\u0026#39;differences.csv\u0026#39;) # Print the differences print(df_diff) The compare() function returns a new DataFrame showing the differences between the two input DataFrames. The output DataFrame will have a MultiIndex with two levels of columns. The first level corresponds to the original column names, and the second level contains two labels: \u0026ldquo;self\u0026rdquo; for the values in the first DataFrame (df1) and \u0026ldquo;other\u0026rdquo; for the values in the second DataFrame (df2).\nIf you want to display only the rows with differences, you can use the following code snippet:\nrows_with_diff = df_diff.dropna(how=\u0026#39;all\u0026#39;).stack().reset_index() rows_with_diff.columns = [\u0026#39;index\u0026#39;, \u0026#39;column\u0026#39;, \u0026#39;df1\u0026#39;, \u0026#39;df2\u0026#39;] print(rows_with_diff) This will output a DataFrame with the index, column, and the different values for each DataFrame, making it easier to identify the differences.\nHow does ChatGPT work? Last week, some of us researchers at the University of Tennessee, wrote up an essay explaining the basics of GPT models. It is for people who are like, \u0026ldquo;what the heck is this thing and how is it so good?\u0026rdquo;.\nChatGPT and similar LLMs are powerful AI generative models based on deep neural networks with multiple layers of transformer blocks, trained on diverse text sources and fine-tuned for specific tasks. ChatGPT can conduct human-like conversations, solve problems, and provide information related to a user\u0026rsquo;s question but may provide inaccurate information, misinterpret context, and perpetuate biases. LLMs, like ChatGPT, have limitations as they are only as reliable and accurate as the data they have been trained on, and can produce irrelevant or misleading responses due to misinterpretation or hallucination. Using ChatGPT at face value can hinder creativity, critical thinking, and problem-solving skills, and users should critically evaluate its output and consider its limitations. Check it out and share your thoughts!\nAI is developing fast. Light-speed fast. GPT models have been making huge strides. In February 2023, Meta launched LLaMA, a relatively small but capable language model, which was soon leaked to the public. The following month saw rapid innovations and developments, including minification efforts, fine-tuning on a laptop, the release of Alpaca, and the creation of GPT4All. By the end of March, open-source GPT-3 models were available, and multimodal training could be achieved in one hour. The open-source language model ecosystem continued to expand and become more accessible for users.\nIn April, the Koala dialogue model was launched, and it was shown that real humans couldn\u0026rsquo;t consistently tell the difference between it and ChatGPT. Why Berkeley\u0026rsquo;s Koala is critical? It was only trained on public data sources, including some cleverly collected data sources. Some scourged through Sharegpt.com to find only good responses, and use them in training dataset. This improvement in underlying data more than translated to actual responses.\nOpen Assistant then released a model and dataset for Alignment via RLHF, which made reinforcement learning from human feedback more accessible to smaller experimenters. As a Google researcher wrote, the landscape of language models evolved rapidly, with the community no longer dependent on LLaMA and open-source solutions becoming increasingly sophisticated and competitive.\nWhen I started my brain dump on AI about a year ago, I hadn\u0026rsquo;t imagined things would accelerate this fast. Few months ago when I tried to (not) make predictions about AI, I still hadn\u0026rsquo;t imagined this trail. But I like where it\u0026rsquo;s headed. Fast development usually means greater accessibility. In my mind, we should be far less concerned about \u0026ldquo;AI Takeoff\u0026rdquo; than unequal access to AI. The future is already here \u0026ndash; it\u0026rsquo;s just not evenly distributed.2\nAs I was telling Greeshma, this book is the epitome of Sci-Fi. It was my first space adventure book and it delivers. If you haven\u0026rsquo;t already, read it. And then watch the movie. Not the other way around.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nWilliam Gibson riffs on writing and the future | (ted.com)\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/pandas-compare-df/","summary":"I needed to compare two data frames with over five million rows. I asked ChatGPT for help. I ended up learning about SHA-256 and hashes.","title":"Comparing two BIG data frames in Pandas"},{"content":"Introduction Many people may ask what ChatGPT is. To answer this question, we asked ChatGPT to \u0026ldquo;define what ChatGPT is.\u0026rdquo; Here is the definition ChatGPT provided:\nChatGPT is a large language model trained by OpenAI, based on the GPT (Generative Pre-trained Transformer) architecture. It uses deep learning techniques to generate human-like responses to text-based inputs, such as chat messages, emails, or social media posts.\nChatGPT has been trained on a vast amount of text data and is capable of understanding natural language and context, allowing it to provide accurate and coherent responses to a wide range of questions and topics. Its capabilities make it a useful tool for various applications, such as chatbots, customer service, and language translation.\nThis definition means that ChatGPT is an AI-powered chatbot that uses Large Language Models (LLMs) to produce a response. LLMs are trained on massive databases of knowledge or content, allowing them to predict the next \u0026ldquo;word\u0026rdquo; in a sentence. They operate at incredible speeds and integrate information from their training database with what they recall from previous conversations with the user. A \u0026ldquo;word\u0026rdquo; could be a literary word, piece of code, or any other mode of input.\nLLMs have wide applications in the industry and can automate many manual tasks. See GPTs are GPTs: An early look at the labor market impact potential of large language models (openai.com) for an analysis of the labor market impact of LLMs. As LLMs evolve, spurred by the greater capabilities of AI, we can anticipate they will have a significant impact on our daily lives and professional endeavors.\nBelow is additional information about how LLMs such as ChatGPT work, examples of how it can be used, and example responses showing that it is not ideal for every topic.\nChat GPT: Technical Details ChatGPT belongs to a broader class of generative models (see Appendix for details on generative models). It is based on a deep neural network that processes input data. Per ChatGPT, a neural network is \u0026ldquo;a computational model that is inspired by the structure and function of the biological neurons in the brain. It is an artificial intelligence technique that is widely used in machine learning, pattern recognition, computer vision, natural language processing, and other fields.\u0026rdquo; A neural network has multiple layers of what are called transformer blocks. These transformer blocks contain several parts that are adept at focusing on distinct segments of the input text using self-attention.\nTransformers with self-attention are programmed to capture intricate patterns in the data. Thus, LLMs develop a response based on the patterns the transformer blocks identify. This type of learning enables ChatGPT and other LLMs to produce answers to prompts quickly.\u0026quot; A neural network can be trained to perform a wide range of tasks, such as image classification, object detection, speech recognition, language translation, and text generation.\u0026quot; However, LLMs have limitations.\nWhat ChatGPT (and LLMs) Can Do One of the fascinating features of ChatGPT is its ability to generate responses to user input using a language model trained on a vast corpus of human knowledge. The training data for ChatGPT is sourced from a diverse range of text sources, such as books, articles, and web content, and Engineers using natural language processing tools preprocess the data to eliminate irrelevant information and ensure consistency.\nThe training process involves fine-tuning the model\u0026rsquo;s parameters to minimize the difference between the predicted output (a good response) and the actual output (what ChatGPT produces), which can take several weeks or even months, depending on the training dataset\u0026rsquo;s size and the model\u0026rsquo;s complexity. To choose the best response from several responses, human feedback is provided as a positive reinforcement. (See Appendix for learning about Reinforcement Learning with Human Feedback).\nModels pre-trained on large amounts of data, such as ChatGPT, are potent tools and available for general use. They can be fine-tuned for specific tasks by training them on additional data, resulting in a specialized fine-tuned model. Fine-tuned models are adept at handling specific tasks, as they have been tailored for a smaller, more specific dataset. This fine-tuning is technically called \u0026ldquo;few-shot learning.\u0026rdquo; See the Appendix for additional information.\nOnce ChatGPT (and other LLMs) are trained and tested, they can be used for a range of purposes, from generating text to helping solve mathematical and coding problems. Because ChatGPT is a learning model, it learns from its conversations with users, and its answers can vary.\nThe following three examples show what ChatGPT can do.\n1. Solving a Quadratic Equation Figure 1 shows that ChatGPT understands the user\u0026rsquo;s requests and prompts to obtain help in solving a math problem by providing the solution to the quadratic equation the user needs. By going through each step, ChatGPT demonstrates ChatGPT its proficiency in solving mathematical problems.\nFigure 1: ChatGPT helps the user solve a quadratic equation.\nWhile ChatGPT arrived at the correct answer in Figure 1, it doesn\u0026rsquo;t mean that ChatGPT provides the correct answer every time. In fact, it still stumbles to do basic math correctly. In this tweet, a user shows how wrong ChatGPT\u0026rsquo;s arithmetic answers can be.\nChatGPT recognizes and understands the context using its language model to analyze the surrounding text. As a result, it can carry out a more sophisticated conversation using the previous interactions with the user.\nFurthermore, ChatGPT is constantly learning and improving based on user interactions. As more users interact with it, it improves its ability to understand the context and provide more accurate responses.\n2. Creating an R Shiny App In figure 2, the user asked ChatGPT to create an R Shiny app to demonstrate the Central Limit Theorem of statistics. In its first attempt, it got the wrong answer. However, when the user pointed out what it missed, it apologized and corrected itself! Thus, the user should always review the answer ChatGPT provides to ensure it is correct and complete.\nFigure 2. ChatGPT gets a wrong answer, apologizes, and corrects itself.\n3. Finding definitions and references Figures 3 and 4 show the conversation between ChatGPT and users who asked for a definition and followed with a request for references. In the first instance, when ChatGPT defines itself, it does provide specific references, but they need to be verified because when ChatGPT hallucinates, it may generate a made-up citation.\nFigure 3. ChatGPT can present references but the user needs to verify if they\u0026rsquo;re real, or if it\u0026rsquo;s hallucinating.\nIn the second instance, when asked to define a Morality Play, ChatGPT provides a definition and some examples. It does not provide a specific reference for the definition, though, simply places to look for the definition.\nFigure 4: ChatGPT defines Morality play and provides some general references but warns user to check them herself.\nWhat ChatGPT (and LLMS) CANNOT Do While LLMs possess remarkable capabilities in generating human-like responses, they have limitations. They are limited to the data they have \u0026ldquo;read,\u0026rdquo; or been trained on, and merely read patterns. They do not think creatively or critically about the data.\nOne notable shortcoming of ChatGPT (and other LLMs) is its inability to guarantee the reliability and accuracy of the information provided. As an AI model trained on vast textual data, it may inadvertently reproduce incorrect or outdated information, which can be particularly problematic for users seeking trustworthy answers in academic or professional contexts. In Example 3, figure 3, ChatGPT responded: \u0026ldquo;As an AI language model, I don\u0026rsquo;t have access to future references.\u0026rdquo; ChatGPT\u0026rsquo;s training data is limited to September 2021, thus a 2022 article is not in its knowledge base or the data it read, so it still sees that as the future.\nAdditionally, ChatGPT can sometimes misinterpret context or miss out on subtle nuances, leading to responses that may be irrelevant, misleading, or inappropriate; it is also prone to hallucinations. For LLMs, hallucinations are outputs that are not based on actual data but rather patterns and associations it has learned from the training data. For a detailed explanation of hallucinations, see the Appendix.\nOverfitting also affects large language models by causing them to produce responses that are too specific or biased towards the training data, reducing their ability to generalize to new, unseen situations. As a result, LLMs may fail to provide accurate or relevant answers, exhibit excessive verbosity, or perpetuate biases present in the training data. Learn more about overfitting in the Appendix.\nAnother area where ChatGPT falls short is in fostering creativity and critical thinking skills. Although AI can generate content and suggest ideas, it does not offer innovative or original insights. Relying on AI-generated content may hinder users\u0026rsquo; creative thinking and problem-solving abilities.\nEthical concerns also arise from using LLMs like ChatGPT, as they may perpetuate existing biases in the training data, potentially leading to unfair conclusions. For example, asking ChatGPT to provide information on a controversial topic could generate biased responses. These limitations should be carefully considered when using ChatGPT and similar AI models for various tasks. Evaluating and thinking critically about the output or response any LLM returns is important for developing a balanced approach, keeping in mind that while it may provide accurate answers or solve certain problems correctly, it has potential risks and drawbacks.\nConclusion In the ever-evolving landscape of artificial intelligence, ChatGPT and similar systems emerge as powerful and versatile LLMs rooted in deep neural networks. Comprising multiple layers of transformer blocks, this behemoth is adept at predicting the succeeding \u0026ldquo;word\u0026rdquo; in any given sentence, drawing from an immense repository of knowledge.\nDespite its proficiency, ChatGPT may not always provide accurate information; it might misinterpret context, hinder the user\u0026rsquo;s creativity and critical thinking skills, and perpetuate biases or outdated information.\nKey Insights ChatGPT and LLMs belong to a class of AI generative models based on deep neural networks with multiple layers of transformer blocks, which are trained on a diverse range of text sources and fine-tuned for specific tasks using additional data.\nChatGPT can conduct human-like conversations, solve math and coding problems, and provide definitions, references, and information related to a user\u0026rsquo;s question. It can improve its learning through interacting with users.\nChatGPT limitations include providing inaccurate information, misinterpreting context, and creating hallucinations. It also cannot provide information it has not been trained on or learned about through interactions (see Example 3). Additionally, using ChatGPT at face value can hinder creativity, critical thinking, and problem solving skills.\nChatGPT has ethical concerns, including the perpetuation of existing biases in training data, potentially leading to unfair conclusions. Therefore, users should critically evaluate its output and consider its limitations.\nUsers should consult credible sources to verify the information in doubt and to ensure that the references it generates on a topic or a problem are based on reality or on \u0026ldquo;actual data\u0026rdquo; it has learned instead of hallucinations (see the Appendix).\nAppendix Here is further information on some of the terms used above.\nFew-shot Learning Few-shot learning is a subfield of machine learning that focuses on the ability of a model to learn new concepts and generate new examples by generalizing from only a few labeled examples.\nOne approach to few-shot learning is to use meta-learning, which involves training a model on multiple learning tasks so that it can continuously learn from only a few examples. For example, a model could be trained to recognize handwritten digits. Then, given a new set of digits with only a few labeled examples for each digit, the model could quickly adapt and learn to recognize the new digits based on what it has already learned from previous training.\nAnother approach is to use data augmentation techniques to generate new examples from existing labeled data (for example, by rotating or mirroring images in image datasets, or rephrasing sentences in text datasets). This improves the model\u0026rsquo;s generalization capabilities and reduces the risk of overfitting. (See Appendix for learning more about overfitting.)\nGenerative Models A generative model is a type of machine learning model designed to generate new data that is similar to its training data. Unlike discriminative models that are used to classify, categorize or label input data, generative models are designed to learn the underlying patterns and distribution of the data and generate new samples that follow the pattern and distribution.\nHallucinations Hallucinations in AI learning refer to when an AI system generates or outputs information that is not based on reality, or on actual data, but rather on the patterns and associations it has learned from the training data.\nFor example, an image recognition system trained on a dataset of dogs might generate an image of a \u0026ldquo;dog\u0026rdquo; that has multiple heads or legs, simply because it has learned that the presence of certain features (e.g. fur, ears) are strongly associated with the label \u0026ldquo;dog\u0026rdquo; in its training data, without understanding what a real dog actually looks like.\nIn some cases, these hallucinations can be harmless or even amusing. In other cases, they can have serious consequences, such as when an autonomous vehicle \u0026ldquo;sees\u0026rdquo; a non-existent object and causes an accident or when a chatbot generates offensive or harmful responses.\nOverfitting Overfitting is a situation where an AI model learns to perform very well on the training data but doesn\u0026rsquo;t perform well on new, unseen data. This occurs when the model learns the training data too well, capturing not only the underlying patterns but also the noise or random fluctuations in the data.\nIt\u0026rsquo;s like memorizing the answers to a specific set of questions but struggling when faced with different questions on the same topic. As a result, the model doesn\u0026rsquo;t generalize well to new situations and has limited real-world applicability.\nReinforcement Learning with Human Feedback Reinforcement learning with human feedback is a method where an AI model learns to make better decisions by receiving guidance from humans. The AI model performs actions, and humans provide feedback on the quality of those actions. The model then uses this feedback to improve its future decisions.\nThink of it like teaching a pet: when the pet does something right, it gets a treat or praise, and when it does something wrong, it gets corrected. The pet learns from this feedback and gradually improves its behavior. Similarly, the AI model learns from human feedback and enhances its performance over time.\nSelf-attention Self-attention is a mechanism used by some AI models to understand the relationships between words in a given text. It helps the model decide which words are more important or relevant to each other in a specific context. By focusing on these relationships, the model can better comprehend the meaning of the text and generate more coherent and accurate responses.\nYou can think of self-attention as a way for the AI model to pay different levels of attention to different words, based on how they relate to each other in the context of the entire sentence or paragraph.\nAcknowledgements We are a group of faculty and students from the University of Tennessee, Knoxville who came together to write an explanatory article on how ChatGPT works. Our goal is to help UT’s students, faculty and staff and the public develop a more informed understanding of AI and of ChatGPT’s functions.\nWe thank Prof Chuanren Liu (Department of Business Analytics and Statistics) for coordinating the group and for providing valuable insights into this article.\nAbout the Authors Harshvardhan is a PhD candidate in the Business Analytics and Statistics at the Haslam College of Business. You can reach him at harshvar@vols.utk.edu.\nSally Corran Harris is a Distinguished Lecturer and the Associate Director of Undergraduate Studies in the Department of English. You can reach her at sallycharris@utk.edu.\nDania Bilal is a Professor at the School of Information Sciences. You can get in touch with her at dania@utk.edu.\nLena Shoemaker is a Writer and BA Candidate (English) in the Department of English. You can email her at hshoema2@vols.utk.edu.\nAlexander Yu is a BS Candidate (Computer Science) in the Department of Electrical Engineering and Computer Science. You can reach him at ayu5@vols.utk.edu.\nAll the authors are affiliated to the University of Tennessee, Knoxville.\nAdditional Resources What Is ChatGPT Doing \u0026hellip; and Why Does It Work?\u0026mdash;Stephen Wolfram Writings\nIntroducing ChatGPT (openai.com)\nWould Chat GPT Get a Wharton MBA? A Prediction Based on Its Performance in the Operations Management Course\nGPTs are GPTs: An early look at the labor market impact potential of large language models (openai.com)\nFuture Tools - Find The Exact AI Tool For Your Needs\n","permalink":"/gpt/","summary":"Curious about ChatGPT, the AI chatbot that\u0026rsquo;s making waves? Dive into this article to learn how it generates human-like responses and its many applications. Get insights into both its strengths and limitations, while understanding why it\u0026rsquo;s essential to approach its responses with a critical eye.","title":"How does GPT work? Understanding Generative AI Models"},{"content":"\nImagine chatting with your pet cat, discussing non-violence with Mahatma Gandhi, or seeking therapy from a virtual counselor named Isha. I created a GPT Chatbot which lets you engage in conversation with various personalities from history and beyond, all from the comfort of your own Terminal.\nCurious to know how it works? I\u0026rsquo;m here to walk you through the entire process, step by step. Avast ye, let\u0026rsquo;s weigh anchor and set sail!\nGetting Started First things first, you\u0026rsquo;ll need to get your hands on the chatbot. Simply follow these straightforward steps:\nClone the repo or download the zip from GitHub (look for the green button on the top right).\nInstall the requirements by running pip install -r requirements.txt. You must execute this in the same Python environment as your usual Python.\nFor Anaconda users, this would mean running conda env list to get list of all environments. Then choose the right environment with conda activate env_name.\nObtain OpenAI API keys from https://platform.openai.com/account/api-keys and place them in the Python script called openai_keys_user.py. (The benefit of keeping keys in a separate file is that you can share the app without its keys.)\nNavigate to the directory where you downloaded the repository. You can use cd path/to/folder for this.\nRun the script with python3 app.py.\nThat\u0026rsquo;s it! Now you\u0026rsquo;re ready to choose a personality to talk to and have a blast.\nRemember, if you need help, feel free to ask ChatGPT or comment below.\nMahatma Gandhi Gary, my Cat I really want to have a cat but it\u0026rsquo;s too much responsibility. Further, wild \u0026gt;\u0026gt; pets.\nGary is fascinating to talk to. I\u0026rsquo;ve noticed that talking to the cat (cat_prompt.txt) is especially enjoyable when using GPT-4, though the current app is based on GPT-3.5-turbo. Once I gain access to the GPT-4 API, I\u0026rsquo;ll be sure to update the app accordingly.\nI asked Gary to describe itself.\nHow does it work? You might be wondering how this chatbot works. Well, it\u0026rsquo;s all thanks to the OpenAI API, which is responsible for generating text. Here\u0026rsquo;s app.py: gpt-chatbot/app.py\nThe code outlines the core functions needed to run the chatbot, generate responses, and manage conversation flow. The primary function in this implementation is `response(), which takes a list of messages as input and generates a response using the GPT-3.5-turbo model from OpenAI.\nThe messages are structured as a list of dictionaries containing a \u0026ldquo;role\u0026rdquo; (either \u0026ldquo;system\u0026rdquo;, \u0026ldquo;user\u0026rdquo;, or \u0026ldquo;assistant\u0026rdquo;) and the message \u0026ldquo;content\u0026rdquo;. The function returns the generated message and its role. To add a message to the list of messages, the add _message() function is utilized, taking the current list of messages, a new message, and the role associated with the message as arguments.\nThe execute_chatbot() function handles user interaction, providing a choice of chatbot personalities and managing the conversation loop. Depending on the user\u0026rsquo;s choice, a system message is created to define the chatbot\u0026rsquo;s personality.\nThen, the conversation loop begins, utilizing the response() and add_message() functions to generate responses and manage the flow of the conversation. The user can type \u0026rsquo;exit\u0026rsquo; at any time to end the conversation.\nWhat\u0026rsquo;s Next? In wrapping up, this chatbot serves as a fun and adaptable space for users to engage with a variety of AI personalities, such as a wise therapist, renowned historical figures, and even a quirky talking cat. The showcased code is a testament to how straightforward and efficient the process can be with the help of GPT-3.5-turbo and OpenAI\u0026rsquo;s API.\nAs I\u0026rsquo;m always looking for ways to improve and expand this chatbot, I would love to add more personalities to chat with. If you have any suggestions or ideas for new characters and improvements, don\u0026rsquo;t hesitate to open an issue on the GitHub repo.\nChatGPT and Gandhi\u0026rsquo;s advices were used in writing this post.\n","permalink":"/gpt-chatbot/","summary":"Building a customizable chatbot that brings unique characters to life","title":"Crafting Conversations with GPT Personalities in Python"},{"content":" Around half a year ago, Twitter made an exciting announcement about a new paid subscription service called Twitter Blue. For $8 per month (or $11 per month via mobile purchase), users can sign up for a host of exclusive features, including a blue badge, prioritized conversation ranking, fewer ads, bookmark folders, custom navigation, tweet editing, undoing tweets, and more.\nAs handy as these features, the service is not yet fully available to all users, as it can only be purchased via the iOS app or web, not Androids. It’s unclear what the tangible benefits for common users are, except for a visibility boost. In my opinion, being able to edit tweets is unnecessary — you don’t need the ability to edit, you just need to forgive yourself.\nWhile launching a half-baked service is not a new phenomenon in the tech industry, it’s concerning for a utility service like Twitter, which has a global user base, not to have an Android version. Twitter’s largest market is India, which is predominantly an Android market. Moreover, the prices for Twitter Blue are quite steep - who would pay ₹9,400 per year for a social media platform?\nStarting today, the original Twitter verification marks are gone. On April 1, Twitter is stripping away the legacy verification badges from the platform in favor of the paid badges associated with Twitter Blue subscriptions. Then starting April 15, the platform apparently will no longer promote non-paying Twitter users via its recommendation algorithm on the For You feed. (The inability to participate in the polls sucks.)\nWhile these changes may seem concerning, they are part of Twitter’s ongoing efforts to increase revenue and create a more sustainable business model. Majority of their past earnings were from advertisements.\nAnnual revenue of Twitter from 2010 to 2021, by segment. Advertising revenue has been increasing while data licensing revenues are relatively constant. Source: Statista.\nIt remains to be seen how these changes will affect the user experience on the platform and whether they will be beneficial in the long run.\nLegacy verification badges on Twitter provided credibility and legitimacy to users, but with the introduction of paid badges, some may view the process as exclusive and biased towards those who can afford it.\nThe decision to no longer promote non-paying users via Twitter’s recommendation algorithm on the For You feed has raised concerns among influencers. This change may disproportionately affect smaller accounts and marginalized communities, who may not have the resources to pay for Twitter Blue subscriptions.\nBut not all are excited about it. In fact, most Twitter Blue subscribers are nothing close to influencers — over 20% has fewer than a hundred followers. Notable figures and outlets, from LeBron James to the White House, have said they won’t be paying for verification.\nSo, any Tom, Dick and Harry will have the blue checkmark but not the government agencies, celebrities, and influential figures. We’re gonna see a return of “real” prefixes in profile names.\nWho uses Twitter Blue? Travis Brown has collected data on Twitter Blue users from its launch.\nWe compiled this list by combining two approaches. The first uses a Twitter profile scraper that is one of the components of the Hassreden-Tracker project, which was supported by Prototype Fund in 2022. The second involves searching the Twitter API for tweets by Twitter Blue subscribers, with queries designed to cover areas of the Twitter graph that the first approach may miss (for example non-English-language accounts).\nI thought it would be interesting to see who are they. Here’s the exploration! You can download the R Markdown from my Github.\nReading in the data Data source: https://github.com/travisbrown/blue.\nSince the doesn’t have column names, I will add it. Using janitor, I will clean the names. It’s clean_names() function is an absolute blast. It converts CAPITALS and spaces to small_letters_with_underscores. Pretty standard. See the code on my Github.\nData Here are the first ten rows of the data frame. I am using kable for printing a good looking table. (Later on, I will use DT for an interactive table where you can sort, filter and search.)\naccount_id screen_name legacy_verification_status follower_count date_blue_sub time_blue_sub sub_status 12 jack V 6548240 2022-11-10 1668066884 U 18 Adam NA 4 2022-11-10 1668092307 S 22 rabble NA 18606 2022-11-10 1668111426 U 41 drx NA 130 2023-03-03 1677853595 B 58 Darkside NA 6065 2022-11-18 1668748244 B 59 Tim535353 V 9369 2022-11-11 1668139623 B 76 marciadorsey V 19598 2022-11-11 1668142394 B 294 ario NA 5783 2022-11-10 1668076741 B 295 joshk V 149304 2023-02-23 1677191326 B 324 chrisfralic V 41137 2022-12-16 1671171125 B Here’s a brief detail on columns.\nColumn Name Description Example Values account_id Account Identifier 121, 18, 22 screen_name Username jack, Adam, rabble legacy_verification_status B for Business accounts, G for Government accounts and V for Verified but type not specified B, G, V follower_count How many followers do they have? 6548240, 4, 18606 date_blue_sub Date they first got Twitter Blue 2022-11-10, 2023-03-03 time_blue_sub Time they first got Twitter Blue 1668066884, 1668092307 sub_status Current Twitter Blue status.\nB for Subscribed to Twitter Blue, U for Unsubscribed, S for Permanently suspended, D for Self-deactivated\nB, U, S, D I find it funny that Jack Dorsey, the founder of Twitter doesn’t have the first account. Who got it? Some engineer in his team?↩︎\nLet’s dive in to the analysis.\nPopularity of Twitter Blue Twitter Blue added most number of users in the first two weeks of launch. The next peak is in the second week of 2023. What’s that for?\nTell me in comments, if you know.\nWho are the subscribers? Blue Subscribers with Most Followers The list of Blue subscribers is pretty interesting.\nHere’s the list of Blue subscribers with over a million followers. I am filtering only the users who are still subscribed to the service. (So users who “tried” the service for a month aren’t included.)\nTo make the table interactive where users can sort and search, I used DT. I like DT for it’s simplicity. Its function datatable() is great for creating interactive tables easily. I’ve tried picking up several other table packages in the past like kable, gt is good for beautiful tables but they’re not interactive, yada yada.\nOf the top-10 most popular accounts on Twitter, only Elon musk is the subscriber. This kinda speaks to the popularity of the service.\nTwitter accounts with most followers worldwide as of January 2023. All numbers are reported in millions. Source: Statista.\nHow many followers do Blue subscribers have? For this task, I am going to break down the follower count into smaller groups. Since most Blue subscribers do not have a huge fan following (Elon Musk, again, is an exception), it is a necessary step — else, histograms wouldn’t look relevant.\nNumber of Followers Number of Blue Subscribers 0-100 110K 100-1K 194K 1k-10k 184K 10k-100k 71K 100k+ 14K Blue seems to be more popular among the less popular accounts on Twitter. That’s interesting.\nHow many accounts have less than 10 followers? 25,550 accounts have fewer than 10 followers. Out of that, 3,480 have zero followers. Who are these people and what’s the value of using Twitter Blue for them? Is it just an act of curiosity — are they simply early adopters of the service? Or are they in strong defiance of old Twitter, grabbing the opportunity of premium Twitter as soon as it shows up? Maybe they’re just very expensive bots.\nHere’s a histogram of number of followers for accounts with fewer than 1,000 followers.\nThis is especially interesting because the average number of followers for an active Twitter user is around 159 (in the US). For someone willing to pay extra for a service, you would expect them to be heavy users. You wouldn’t expect them to be lurkers; they’d be core users.\nOn the other hand, 391 million Twitter accounts have no followers at all.\nTypes of Accounts: Business, Government and Society It feels like almost all the subscribers are Musk fanboys. There are almost no government accounts, very little business accounts and a few celebrities (who probably wanted to try the Blue service).\nHow many of the original Blue Subscribers are still using it? The data has a column called sub_status which has this detail.\nWho left Twitter Blue? A vast majority of unsubscribers (122,823) are regular users. My guess is they were experimenting with the service as early adopters. 499 of them were verified accounts who later chose not to continue their verification. 21 of the unsubscribers are businesses, and eight of them are government agencies.\nHere’s the list of government agencies: TimWattsMP, UNDPEurasia, RepRaulGrijalva, DenverOEM, JoaquinCastrotx, TDEM, SteveScalise, EPAMichaelRegan .\nConclusion Twitter Blue, the premium subscription service offered by Twitter, saw a massive surge in subscriber numbers immediately following its announcement. However, this momentum was short-lived, and the rate of new subscriber additions declined significantly, with one exception. In January, something extraordinary happened, which resulted in a notable increase in new subscribers.\nIt’s worth noting that out of Twitter’s top-10 most followed accounts, only Elon Musk has a Blue account. While some celebrities and businesses do have Blue accounts, they are vastly outnumbered by regular Twitter Blue users.\nInterestingly, a significant percentage of the group that initially signed up for Blue has already left the service. However, it’s difficult to determine the exact number, as apparently Twitter is not actively removing verified badges even after people stop their subscription.\nWhat are your thoughts on this? Do you find these trends surprising or expected? Share your insights in the comments below!\nThe codes for this project can be found on my Github.\n","permalink":"/twitter-blue/","summary":"In this blog post, I explore who are the Twitter Blue subscribers. It is not celebrities, businesses or governments. It is our regular old Joe with fewer than a hundred followers.","title":"Who are Twitter Blue Users?"},{"content":" Many people ask me, why didn\u0026rsquo;t you continue your father\u0026rsquo;s profession as a painter? My answer is, I did. My father\u0026rsquo;s painting was a few centimeters in size. My painting is thousands of square kilometers in size.\n\u0026mdash; Liu Thai Ker, designer of Singapore city\nLiu Thai Ker, the designer of Singapore city, explains that he did continue his father\u0026rsquo;s profession as a painter, but in a different way. While his father\u0026rsquo;s painting was small, Liu\u0026rsquo;s paintings are thousands of square kilometers in size. This quote is an analogy for his work as a city designer.\nWhen Singapore became independent in 1965, it was marred by poverty and squatters. The government decided to build good quality affordable housing for everyone, and within thirty years, all the squatters were gone, and Singapore had enough housing for everyone.\nLiu looked for \u0026ldquo;design genes\u0026rdquo; in creating a new city, such as customs, culture, and environment. For example, in many Asian countries, including Singapore, strong colors are not common, as even strong colors become pastel colors under the strong sun.\nPastel colour buildings in Singapore. Bright colours turn to pastel colours due to the amount of sunshine.\nSunlight is an important factor for healthy living. Heights of the building is crucial for allowing the appropriate amount sunlight to reach all homes. Liu considered the height of buildings when designing cities. The height of a building depends on the position of the city itself and its surroundings. Liu used a chess-like design when planning the location of apartment complexes, where every black block is a building, and every white area is a garden or low-rise convenience market.\nEven the height of building can be determined by the geographical location of the city. Samuel Hughes did a wonderful exposition on how tall can the buildings be depend on the city itself. Most of Bay Area has a ceiling of two storeys. Growing up in London, Samuel believed it had to be at least four or five, since that is the norm for Gregorian terraces.\nGregorian structures with three floors.\nVisiting Manhattan made him realise it might be way more. The height of the building cannot be absolutely determined without considering the building\u0026rsquo;s surroundings. For example, the width of the street plays a role.\nTake a look at this side-by-side comparison between a pre-1916 street in New York and post-1916 avenue. The biggest difference is the streets are allowed to be wide.\nIn many American cities, grocery stores are far from residential areas. As a result, families have to buy enough groceries for two to three weeks, including items that should be bought fresh, such as bread, vegetables, and fruits. This situation has led to an increase in processed food consumption. Companies have been incentivized to add more preservatives to their items so that they do not spoil so quickly.\nJohnny Harris did an interesting comparison of bread from the US and France. France has 30,000 independent bakeries, while the US has only 3,000 independent bakeries. Watch it on YouTube.\nIn the US, zoning has created sprawling suburbs with neighborhoods of single-family homes, leading to longer commutes and increased traffic. This wasteful and costly land use pattern is a result of separating land uses into distinct zones.\nEuropean countries, where zoning actually originated, took a different approach. They emphasized mixed-use development and limited the number of zones. Germany, for example, has only four basic zone types: residential, mixed, commercial, and special, each with its own subclasses. This leads to more walkable, charming cities where people can access essential services without relying on cars and reduces the carbon footprint of cities.\nTransportation safety matters too. This could be the subject of another blog but you should check out City Beautiful channel on YouTube. I will just give you one case in point: arterial road design. They are among the deadliest roads for pedestrians and cyclists. They account for 12% of all roads and 57% of all traffic fatalities.\nThe arterial road design is a type of road network design that involves the construction of high-speed, multi-lane roads with limited access points. These roads are designed to handle large volumes of traffic and to provide fast and efficient travel between destinations. However, studies have shown that arterial roads are also associated with a higher risk of accidents and fatalities.\nIn conclusion, urban planning is a crucial aspect of any city\u0026rsquo;s development. The design of cities can have a significant impact on the health, safety, and well-being of its residents. While some urban planning decisions may seem harmless, such as the atrial road design, their impact on public safety cannot be ignored.\nAs we continue to shape and reshape our cities, it is important to consider the long-term effects of our choices. Only by prioritizing the needs of the people and the environment can we create truly sustainable and livable cities for generations to come. Remember, the cities we build today are the legacy we leave for tomorrow.\n","permalink":"/cities/","summary":"Height of the building determines the sunlight exposure, chess-block organisation helps with high-rise congestion and how zoning laws are hurting Americans.","title":"Planning Cities with People"},{"content":"\u0026ldquo;Yoga is the restraint of the fluctuations of the mind\u0026rdquo; \u0026mdash; Patanjali\nThe Yoga Sutras of Patanjali are a foundational text of the yogic tradition, offering guidance on how to cultivate a state of equanimity and inner peace through various practices and techniques.\nAs Patanjali wrote in the opening lines of the text, \u0026ldquo;Now, the teachings of yoga\u0026rdquo; - inviting readers to embark on a journey of self-discovery and spiritual growth.\nThe Yoga Sutras of Patanjali describe various concepts that relate to the acquisition and processing of knowledge, memory, and states of consciousness. These concepts are known as chitta-vritis, or fluctuations of the mind.\nHere are five key chitta-vritis to know and actionable insights to apply them in your daily life:\nPramana (प्रमाण): This concept refers to valid knowledge or reliable means of acquiring knowledge.\nTo cultivate pramana, focus on developing your powers of observation, logical deduction, and critical thinking. Seek out reliable sources of information and be mindful of biases that may influence your understanding of a topic.\nSmriti (स्मृति): This concept refers to memory or the ability to retain information.\nTo improve your memory, engage in activities that challenge your brain, such as learning a new language or skill. Practice recalling information without relying on external aids such as notes or technology.\nNidra (निद्रा): This concept refers to sleep or the state of unconsciousness.\nTo improve your quality of sleep, establish a regular sleep schedule, create a calming bedtime routine, and optimize your sleep environment for comfort and relaxation. Consider incorporating yoga nidra or other relaxation techniques to deepen your sleep and promote overall well-being.\nVikalpa (विकल्प): This concept refers to imagination or the ability to create mental constructs.\nTo use vikalpa effectively, cultivate a balanced approach to imagination that combines creativity with critical thinking. Be aware of how your thoughts and beliefs influence your perceptions of reality and use your imagination to generate positive outcomes and solutions to challenges.\nViparyaya (विपर्यय): This concept refers to mistaken understanding or incorrect knowledge.\nTo avoid viparyaya, practice self-awareness and examine your beliefs and assumptions regularly. Be open to new information and perspectives and challenge your own biases and limitations to deepen your understanding of yourself and the world around you.\nAs Patanjali wrote in the Yoga Sutras, \u0026ldquo;When the mind is still, then there is yoga\u0026rdquo; - emphasizing the importance of cultivating a state of inner calm and clarity as a means to connect with our true nature and achieve a state of union with the divine.\nBy incorporating these chitta-vritis into our daily practice, we can bring ourselves closer to this state of union and experience greater peace, happiness, and fulfillment in our lives.\n","permalink":"/yoga/","summary":"Through the chitta-vritis of pramana, smriti, nidra, vikalpa, and viparyaya, Patanjali offers insights into the workings of the mind and how we can cultivate greater awareness and understanding of our own mental processes.","title":"Chitta-vritis: Exploring the Depths of Consciousness"},{"content":"I wonder how this AI thing is going to shape up in the near and distant future. Many have said, and I agree, that there hasn\u0026rsquo;t been such a revolutionary growth in productivity since the industrial age. Paul Graham was likely the first one to point this out. He tweeted, \u0026ldquo;The striking thing about the reaction to ChatGPT is not just the number of people who are blown away by it, but who they are. These are not people who get excited by every shiny new thing. Clearly something big is happening.\u0026rdquo;\nElon Musk likely tweeted about it as well. I mean, according to Sam Altman he should be credited for raising the humanity\u0026rsquo;s ambition again after so long. He\u0026rsquo;s right: we had completely forgotten humanity\u0026rsquo;s role as explorers since the moon landing. It\u0026rsquo;s as if the moon landing was only to shove it in Russia\u0026rsquo;s face. But let\u0026rsquo;s be honest. It was more than that. It was the first time we humans ever step foot outside our home: Earth.\nWe shouldn\u0026rsquo;t stop here. Coop says, \u0026ldquo;Mankind was born on earth. It wasn\u0026rsquo;t meant to die here.\u0026rdquo; Mars is just the beginning. We have a long way to go. Explore the world beyond our solar system. Become multi-planetary species, multi-galaxy species. We aren\u0026rsquo;t getting there until something fundamental improves.\nHistory is testimony to the fact that fundamental changes are cyclical. The oldest one was probably agriculture. Once people realised they could farm crops like wheat and millets, and domesticate animals for milk and meat, civilisation was born. Maybe civilisation isn\u0026rsquo;t the right word. Settlements? You get the point.\nA few centuries later came something new: industries. People realised the importance of \u0026ldquo;pooling\u0026rdquo;. Money could be pooled by many people to venture into riskier ventures. Pooling distributed risk and returns \u0026mdash; making things palatable to bigger groups than just kings and aristocrats.\nPooling also resulted in governmental reforms. Opinions \u0026ldquo;pooled\u0026rdquo; together to become ideologies. These ideologies resulted in fractions within the society, while taking away power from the hitherto central sovereign. The ideologies resulted in political parties that easily filled the void left by the incumbents, usually the monarch. Democracy became the norm.\nBut not positives came out of \u0026ldquo;pooling\u0026rdquo;. Like yin and yang, there are good and bad of all innovations. Hatered pooled in as well and we had \u0026ldquo;international\u0026rdquo; terrorist organisations. (Like agriculture resulted in abuse of soil ecosystem and underground water.)\nFor the last few decades, we had slowed down in our innovations. Don\u0026rsquo;t get me wrong: we created personal computers, iPhones and the internet. But it could only take us thus far. Until AI entered the picture.\nPicture reminds me, have you seen the movie Interstellar? Do you remember how dead navy seals\u0026rsquo; minds were installed into multi-purpose robots built by NASA for helping the humans explore the vast space while the earth was dying? Maybe, just maybe, we don\u0026rsquo;t need to do that if we are able to build a general-purpose AI.\nTARS and CASE are the two robots which are part of the Endurance crew in Interstellar movie.\nThere are naysayers to general-purpose AI, or Artificial General Intelligence (AGI). Naval Ravikant is one that I can recall. According to him, all the problems we\u0026rsquo;re solving with AI are pretty closed-form problems. Beating a human at GO isn\u0026rsquo;t too surprising: after all, there\u0026rsquo;s a strategy to win and with enough computations the strategy can be statistically estimated.\nWhat would he say about ChatGPT? I don\u0026rsquo;t know. Haven\u0026rsquo;t seen anything from him yet. But I imagine he\u0026rsquo;d say something along the lines of the Chinese room thought experiment.\nImagine that there is a person locked inside a machine. It is not you, but you\u0026rsquo;re the lab technician tasked to study this \u0026ldquo;machine\u0026rdquo;, which is not actually a machine. Something like the Chess-playing mechanical turk which beat humans at Chess but was later found to be operated by an actual Turk operating it sitting inside.\nMechanical turk being operated by a human chess player.\nAnyway, you start exchanging messages with this machine. The messages are in Chinese. You fully understand Chinese but the person sitting inside can\u0026rsquo;t even read or speak 普通话 (Mandarin). He simply has a notebook (in English) which tells him what characters to respond with when he recieves a particular set of characters.\nThe experiment continues for a year. Possibly more. You as a researcher have begun exchanging love letters with this \u0026ldquo;machine\u0026rdquo;. (Why does everyone want to talk about love and sex with the AI? Sometimes even AI wants to: like when the New York Times author spent two hours chatting with Bing AI.) The machine responds likewise. The person sitting inside the room has no clue he\u0026rsquo;s responding to love letters.\nWould you say this \u0026ldquo;machine\u0026rdquo; is a form of AI? Of course not \u0026mdash; it\u0026rsquo;s manually operated by a human, like Chess-playing turk.\nBut forget for a moment that it\u0026rsquo;s a lab subject communicating and it is, in fact, a machine responding. Does a seemingly understandable and romantic message mean anything to the writer machine? Or is it simply regurgitating the response it can statistically predict best, using the notebook he\u0026rsquo;s provided with?\nStephan Wolfram was surprised to see how ChatGPT could be so good with it\u0026rsquo;s responses. It is indeed surprising. It is predicting token by token what should come next after responding a specific token. This ability of self-attention, i.e. focussing on parts of speech, that relate to something else is a relatively new innovation. Google invented the transformers in 2017, that improved its translation abilities manifold. (It sucks that translation is still that bad.)\nChatGPT is so good because it has been manually trained to do so. Human feedback on responses helps statistical models tremendously. OpenAI Used Kenyan Workers on Less Than $2 Per Hour: Exclusive | Time ($2 per hour isn\u0026rsquo;t low by Kenyan standards, I must add.)\nGoogle seems to be losing the battle. Like most incumbents of power. Even Machiavelli recognises it: \u0026ldquo;It must be considered that there is nothing more difficult to carry out, nor more doubtful of success, nor more dangerous to handle, than to initiate a new order of things.\u0026rdquo; (Chapter 6, The Prince). Microsoft won the battle of launching a minimum viable product. But there war is still on.\nThough it is difficult to say if it is a war or simply a competition. There is more evidence for the latter. Google and Meta were simply afraid of releasing such powerful generative AI tools to the general public. It could have dire consequences. In his book Zero to One, Peter Thiel posits how it is more difficult \u0026ldquo;going from zero to one\u0026rdquo; \u0026mdash; creating something new \u0026mdash; rather than \u0026ldquo;going from one to n\u0026rdquo; \u0026mdash; copying or scaling something that already exists\nEven OpenAI wasn\u0026rsquo;t sure of launching ChatGPT. Sam Altman\u0026rsquo;s decision wasn\u0026rsquo;t appreciated by many in the company, including some in his executive team. But their belief in \u0026ldquo;Overton window\u0026rdquo; resulted in others yielding. (Few people know that ChatGPT was available in OpenAI playground long before and can still be used with almost zero downtime.)\nThe Overton window, named after Joseph Overton, is a model that describes the range of ideas and policies that are considered acceptable to the public at a given time. It is useful to identify the range of ideas or models that are acceptable to the public, and then to work within that range to make decisions that are more likely to be accepted. By doing so, decision-makers can increase the chances of success and minimize resistance or backlash from the public.\nOverton Window is a model that describes the range of ideas and policies that are considered acceptable to the public at a given time.\nWe loved ChatGPT since it was launched. It was the talk of the town. Even though we were slightly concerned about some jobs vanishing, the general sentiment was positive.\nSam acknowledges this \u0026ldquo;AI taking over jobs\u0026rdquo; hypothesis. He supports the idea of Universal Basic Income (UBI) as a potential solution to help those displaced jobs using this new technologies. It could be funded in part by the companies that benefit most by this automation, and by government.\nI\u0026rsquo;m not so sure of UBI. It\u0026rsquo;s not a silver bullet. People do jobs for multiple reasons; money is just one of them. Work gives identity. Doctors proudly say that they\u0026rsquo;re doctors. It also gives a meaning to life. A fisherman would prefer a freshly caught fish over some half-eaten fish given to him for free. A teacher works hard to change a student\u0026rsquo;s life, at least when they\u0026rsquo;re devoted to their work.\nIt is also just a matter of time when the non-productive workers living on UBI become the majority and start voting themselves more money. Pooling would again cause fractions and the the bigger pool would ask for a bigger fund.\nIt\u0026rsquo;s also quite expensive. If we choose to give $25,000 per year as UBI (approx the poverty line of US) to 40 million people (population under poverty line), it\u0026rsquo;d cost the state exchequer $1 trillion. Or around 5% of US GDP. That\u0026rsquo;s more than what US spends on millitary (3.5% of GDP).\nBut I still didn\u0026rsquo;t answer the question I started with: what would the world look like when the AI becomes commonplace? I don\u0026rsquo;t wanna guess. Sorry, I do have guesses but I don\u0026rsquo;t wanna tell you. There are hundreds of pundits online speculating and my guesses are as bad/good as their\u0026rsquo;s.\nWhat I do know is that I shouldn\u0026rsquo;t miss to ride this wave. My surfing teacher told me: \u0026ldquo;when you see a wave coming towards you, there are two options. You\u0026rsquo;re either gonna be hit by it or you can ride it. Timing is important for catching it, and balance is important for staying afloat. But the most important is not missing a good wave.\u0026rdquo;\nI\u0026rsquo;m not missing the wave.\n","permalink":"/ai2/","summary":"Revolutionary changes are cyclical. First was agriculture. More recent one was the growth of industries, powered by pooling. It looks like AI is the latest one.","title":"I wonder how this AI thing is going to shape up"},{"content":"\n\u0026ldquo;Do not be too timid and squeamish about your actions. All life is an experiment. The more experiments you make the better. What if they are a little coarse, and you may get your coat soiled or torn? What if you do fail, and get fairly rolled in the dirt once or twice. Up again, you shall never be so afraid of a tumble.\u0026rdquo;\n\u0026mdash; Ralph Waldo Emerson\nPreviously, I\u0026rsquo;ve argued that improbable doesn\u0026rsquo;t mean impossible. If my old argument isn\u0026rsquo;t convicing enough, I encourage you to take a look at this Veritasium video explaining why most published research is wrong.\nIn the same vein, I am advocating today for you to conduct psychological experiments on youself rather than relying solely on previous research. I\u0026rsquo;m not the first person to come up with this idea. Mahatma Gandhi\u0026rsquo;s autobiography, \u0026ldquo;My Experiments with Truth,\u0026rdquo; reflects his approach to understanding himself and human psychology through experimentation.\nWhat kind of psychological experiment? Let me give you an example.\nTo provide an example of the kind of psychological experiment one can undertake, let me share my personal experience.\nThroughout my life, I have considered myself an introvert, preferring the company of ideas over people. Some people might call me ambivert (or anyone an ambivert, treating the terms introvert and extrovert as extremes). Nonetheless, I had been more of an introvert than an extrovert. I had enjoyed company of ideas more than people.\nWhen I was starting my graduate school, I was moving to a new country (United States). No one knew me there. There was zero emotional baggage and expectations on who I was. My identity was small.\nMy undergraduate psychology class had taught me a cardinal rule about behaviours.\nAttitude + Stimulus = Behaviour\nSome psychologists replace attitude in this equation with personality; Buddhists replace this with Sanskara. I found this true from my own experience.\nI thought: if can\u0026rsquo;t change my attitude, what happens if I change my behaviour? I know how I, an introvert, would react to a situation and how an extrovert would react to it. If I emulate how an extrovert would react, would I become an extrovert?\nAs it turns out, yes.\nMy friends in US would not call me an introvert (I think). Harshvardhan of here behaves differently in situations than Harshvardhan of the past. (I\u0026rsquo;ve since come to realise extroversion is more energy-consuming than introversion, and thus use extroversion as a tool in situations when I need it.)\nIf I had instead read psychological studies on it, I will have to find four large-scale field experiments had tried to test if subjects were able to change their personalities and attitudes by modifying their behaviour, if I\u0026rsquo;m lucky. The studies would\u0026rsquo;ve been successfully replicated, if I\u0026rsquo;m lucky.\nAnd then, I would still have to do the work of coming up with a plan on how to use this conclusion in my own life. Or get help from a therapist who can guide me through it \u0026mdash; a time-consuming option that might still not work.\nTherefore, I recommend you to experiment on yourself and test if it works. Experimentation is cheap. It takes a week and you\u0026rsquo;d know the results. Since you\u0026rsquo;ve experimented on yourself, you know it works for you, which is a major problem with most psychological research \u0026mdash; they don\u0026rsquo;t generalise well.\nWhat are certain things you can experiment and find out? Here are two simple examples.\nHow long do you need to sleep?\nChoose a three-day window where you will sleep for five hours, track your mood and how good you feel after waking up, and then revert to your normal sleep pattern. Repeat the experiment for six hours of sleep during the next three-day window, and continue until you have tested all desired time-periods. Remember to sleep \u0026ldquo;normally\u0026rdquo; between the experiments, i.e. for four days a week. Through this experiment, you can discover how many hours of sleep work best for you, which is likely to vary from the commonly cited average of eight hours for adults. How to be more curious?\nThink and recall every single curious person you know of. Sherlock Holmes. Richard Feynmen. Socrates. How would they react to a particular situation that you\u0026rsquo;re facing right now? Modify your behaviour to align with their behaviour. This would make you feel uneasy but remember its an experiment. You can decide from results that you\u0026rsquo;d rather not like to be a naturally curious person. If you enjoy the modified behaviour, repeat. Soon you will have modified attitude \u0026mdash; you\u0026rsquo;d be more curious. If you don\u0026rsquo;t like it, stop. Being naturally curious is not for you. A Pro-tip Do not disclose your experiments to others; definitely not while you\u0026rsquo;re doing them. The energy required to explain yourself can be better spent focusing on the experiment. With this experimentation, you are already beyond your comfort zone, and discouragement from others may hinder your progress.\nConclusion Instead of relying solely on previously published psychological research, I suggest conducting personal experiments on oneself to test and develop new behaviors and attitudes. Experimentation is a cheap and effective way to determine what works best for you, and it allows for personalised results that can be applied directly to one\u0026rsquo;s life. By trying new things and modifying one\u0026rsquo;s behavior, a person can discover how to optimize their sleep, cultivate curiosity, or any other aspect of their life one wishes to improve.\nAddendum Recently, Dea pointed me to this article (and this guy, Max Hawkins) who had randomized his life, taking experimentation to the heart. This shows me exactly one should strive for \u0026mdash; experimenting new places to visit, new food to eat, and new activities to try.\nYou can watch his TED Talk here.\n","permalink":"/experiments/","summary":"Instead of relying solely on previously published psychological research, I suggest conducting personal experiments on oneself to test and develop new behaviors and attitudes. Experimentation is a cheap and effective way to determine what works best for you, and it allows for personalised results that can be applied directly to one\u0026rsquo;s life.","title":"How to Hack Your Own Mind?"},{"content":"\nA ten-year-old kid was beginning his high school journey when he had to learn a third language. He was already learning two languages: his mother tongue Hindi and the common-speak English. But neither of them had prepared him for Sanskrit. Masquerading as grammar, he felt the Sanskrit was more mathematical than needed. A language\u0026rsquo;s purpose is to communicate. If it can communicate, it works. If it works, don\u0026rsquo;t break it. Chesterton\u0026rsquo;s Fence\nAs much as he despised learning the new language, he loved the rhythm of it. Unlike Hindi or English, most Sanskrit literature is composed of hymns and poems. Sanskrit sounds sonorous. नभ: स्पृशं दीप्तम् \u0026mdash; the motto of Indian Air Force \u0026mdash; sounded so cool that he repeated it over hundred times a day looking at the sky.1 But all of that only improved his pronunciation of Sanskrit hymns. When it came to writing, he was already bad at Hindi, but he was even worse at writing Sanskrit.\nAfter two years of an intense love-hate relationship \u0026mdash; hate during exams and love during class \u0026mdash; he finally had the option to drop Sanskrit. He expected his woes would improve. But alas, the complaints shifted to Hindi. Now, Hindi was worst and stupid and overtly limited.\nA language\u0026rsquo;s purpose is to communicate. If it can communicate, it works. If it works, don\u0026rsquo;t break it.\nHe found respite in classes and always loved listening to literature but dreaded exams. He even wrote the Inter-House Declamations and Debate scripts, though they were full of spelling errors.\nTwo years later, he finally had the option to drop Hindi as well. Now, all that remained was English. English was simple. There are some 26 letters, you rearrange them to make words. You rearrange words to make sentences. Nothing to worry about मात्रा, हलंत, की या कि and numerous other \u0026ldquo;minor mistakes\u0026rdquo;, as he used to call them.\nHe slowly built up his vocabulary repertoire, learning new words every day. For a brief period, he made a habit to pick up a new word, making a sentence with it and writing it in his diary. He never read them ever, but the act of writing made it enough to remember.\nBit and Byte: Computer Languages In high school, he met a language that forever shaped his career: C++. He had played with computer languages before: telling LOGO turtle to turn right, take three steps, make a 120-degree left turn, take three steps, make a 120-degree left turn and take another three steps to make an equilateral triangle. But back then, he played with the turtle, not \u0026ldquo;writing\u0026rdquo; in a language. At least he didn\u0026rsquo;t think of it like that.\nBut C++ was different. It was powerful and could communicate precisely. Furthermore, unlike Sanskrit or Hindi, it had very few grammar rules. If you got the rules, you got the rules. When someone asked \u0026ldquo;why\u0026rdquo; about a rule? The only answer was that Tim Berners Lee decided to make it that way. He picked up the rules and started talking to the computer.\nWhy did I have to write cout\u0026lt;\u0026lt; and not, say, print()? Why are strings stored as an array?\nWhy could you write something like that means nothing, does nothing and costs nothing?\n[](){}; It\u0026rsquo;s a lambda expression that captures nothing, takes no parameters and has no body. It just exists.\nWhy does it work? Because it\u0026rsquo;s permitted by some narrowly defined rules.\nTalking to computers is straightforward. You say something to the computer, and it either gets it or doesn\u0026rsquo;t. There is no way a computer would understand things differently than what you said; if it does, you didn\u0026rsquo;t tell it right. It doesn\u0026rsquo;t have spatial memory either. A for loop means the same thing in a program that runs an Intel chip as it does in a program that powers Google.\nComputers are obedient slaves. Whatever you told them, they did \u0026mdash; if they could.2 There was no otherwise. If they couldn\u0026rsquo;t, they would \u0026ldquo;fail\u0026rdquo;. They had no brains.\nThen he had an evil plan.\nvia GIPHY\nUncle Ben said, \u0026ldquo;with great power comes great responsibility\u0026rdquo;. Uncle Ben was late in telling this to Spiderman, and our hero hadn\u0026rsquo;t watched Spiderman anyway. Being young makes you naive. He used his weekly quota of 10-minute internet time to conjure a plan. He would write a code that would specifically delete some executables from Windows c://, breaking the computer irrevocably and requiring a clean reinstall of Windows.\nC++ is a powerful language and can communicate precisely. (Looking at you, English!) He started with a program to delete the text file he created. Done. He put the text file in \u0026ldquo;Program Files\u0026rdquo; and asked the program to delete it. Done. He made the code delete some files in the \u0026ldquo;Win32\u0026rdquo; folder inside the \u0026ldquo;Program Files\u0026rdquo;. Once he was confident the code worked, he needed a goat to be slaughtered.\nIt can\u0026rsquo;t be his own computer, obviously. His school\u0026rsquo;s computer lab had about 50 computers, and the class size was 36. That left about 14 computers to be experimented with. All plans ready, he copied the .exe file to a pen drive, would put it in a computer that no one used and saw if it worked.\nIt damn well worked.\nThe only problem was that no one cared. The computer was reimaged in less than a week, and Bill\u0026rsquo;s Windows was back! He thought he\u0026rsquo;ll have to 1-up his game. Taking inspiration from hacking scenes in a Hollywood movie, he wrote a code that would show exactly this on the screen. (All he remembers about the movie was that the hero had to crack the code to do something important. That required a password, and the hint was what you sit on but never carry with you. The password was \u0026ldquo;Chair\u0026rdquo;.)\n\u0026gt; ? _ The cursor would keep blinking till you hit 11 keys. Any 11 keys. Unless you press Alt + Ctrl + Del which would bring up the task manager and you\u0026rsquo;d have to kill the program. He would\u0026rsquo;ve eliminated the possibility of killing it via the task manager, but the next time he\u0026rsquo;d get his 10-minute internet time was a month later. And he hadn\u0026rsquo;t discovered Stackoverflow yet. (Was it even a thing?)\nWith the backdoor in mind, he combined both tools which made the executable .exe file which will show the black screen waiting for 11 keystrokes and then delete a critical file which would crash the computer. It wasn\u0026rsquo;t difficult to find which file to delete. Windows has been pretty vulnerable.3\nTo test this, he needed time and courage. He approached his other cool friend. Someone who knows computers as much as he did. Much to his surprise, that friend was happy to sacrifice his own computer for the experiment! They sat together in front of his computer, double-clicked the .exe file and nervously hit 11 keys after which the program closed.\nNothing happened. They thought nothing had happened. At least, initially. The friend passed unpolitical remarks to our hero. But the hero was confident.\nA computer is an obedient slave with no brain. How could it not do what he asked?\nBut their confusion was quickly gone. The Windows soon gave an error, and it couldn\u0026rsquo;t boot up. Mission successfully failed!\nBack to Human Languages All was good in Newfoundland till he started college. Now, he had to learn German. Oh, how much he dreaded human languages. Wasn\u0026rsquo;t English enough?\nNot according to the university. They believed \u0026ldquo;modern leaders need to know international languages, owing to growing interest in internationalisation\u0026rdquo;. Much to his dismay, German grammar was very similar to Sanskrit grammar. Hate multiplied. He memorised some words that\u0026rsquo;d be enough for him to pass the exam. He passed by thin margins.\nHe also realised his English, which he took pride in, wasn\u0026rsquo;t as good as he thought. Like a frog in the pond, he had little idea of what was happening across the country and the world. Studying with the \u0026ldquo;bests\u0026rdquo; (somewhat presumptuous as they declared it themselves, like Britain) at the university showed him a mirror. All his debating and declamation experiences were virtually useless when it came to impromptu speeches. As it turns out, he was just an intelligent parrot.\nRealizing his limitations, he approached for help. A professor advised him to start writing and reading extensively. But more than that, start discussing it with others. He liked debating, but now he had to turn it into fruitful discussions. Slowly, he improved.\nAround the same time, he encountered another language, masquerading as a statistical tool: R. The name sounded funny \u0026mdash; and it was a pun intended by the authors of the language as the first letters of their names. But the language was sans all the void main() crap of C++.\nLike here\u0026rsquo;s a code to print the first ten Fibonacci numbers in C++. (He remembers writing codes differently in Turbo C++. But today, he doesn\u0026rsquo;t use C++ at all, and that flavour of C++ is not compatible with modern operating systems like MacOS.)\nC++ code for generating Fibonacci sequence #include \u0026lt;iostream\u0026gt; using namespace std; // function to print first \u0026#34;n\u0026#34; Fibonacci numbers void fibonacci(int n) { int f1 = 0, f2 = 1, i; if (n \u0026lt; 1) return; cout \u0026lt;\u0026lt; f1 \u0026lt;\u0026lt; \u0026#34; \u0026#34;; for (i = 1; i \u0026lt; n; i++) { cout \u0026lt;\u0026lt; f2 \u0026lt;\u0026lt; \u0026#34; \u0026#34;; int next = f1 + f2; f1 = f2; f2 = next; } } // Main function int main() { fibonacci(10); return 0; } 0 1 1 2 3 5 8 13 21 34 ...Program finished with exit code 0 Press ENTER to exit console. Here\u0026rsquo;s how to do it in R. # function to print first \u0026#34;n\u0026#34; Fibonacci numbers fibonacci = function(n) { fibs = numeric(n) fibs[1] = 0 fibs[2] = 1 for (i in 3:n) fibs[i] = fibs[i-1] + fibs[i-2] return(fibs) } # Function call \u0026gt; fibonacci(10) [1] 0 1 1 2 3 5 8 13 21 34 C++ and R differed in one peculiar way: indexing started in C++ with 0 but from 1 in R. He didn\u0026rsquo;t think much of it then. Later, he realised how R was an outlier. R was succinct in conveying its messages.\nComputer was an obedient slave. Now, he had to say fewer words to communicate. And that he did.\nHe started doing more with the computers soon. His most ambitious C++ project was a railway handling system (the algorithm for which he tells was more complicated to write in this blog). With R he could create the back-end and GUI. He didn\u0026rsquo;t need to learn JavaScript to use JavaScript. It was as of he didn\u0026rsquo;t need to learn any C++ to use C++. If he needed something, he could use that.\nUnlike Sanskrit, Hindi or German \u0026mdash; whose grammar he dreaded \u0026mdash; and like C++, R had few and simple rules. Again, all was merry in Newfoundland. He and the computer could understand each other. Many months passed. R and our protagonist became good friends.\nA few years later, he met the new cool kid in town, Python. His initial reaction: it\u0026rsquo;s neither new nor cool and definitely not for kids. (He tells it like he were Dylan from Severance explaining why Optics and Design were cut off from Macrodata Refinement.) Its rules were rather comical. While languages like Ruby were appreciated for their freedom to do things, Python had strict rules on how things should be done.\nThere should be one \u0026mdash; and preferably only one \u0026mdash; obvious way to do it.\nThe writers even added a Zen of Python: a set of 20 axioms that were part programming and life-related.4\nBeautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren\u0026#39;t special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one -- and preferably only one -- obvious way to do it. Although that way may not be obvious at first unless you\u0026#39;re Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it\u0026#39;s a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let\u0026#39;s do more of those! The first month was really troublesome. He was irritated by the weird choices of Python developers. Unlike R, there was little documentation and so few examples! But he realised something else: if he kept choosing not to learn it, the language would of course be bad. But once he chooses to try it, it can get better.\nAnd better, it did get.\nWith concurrent use of two languages R and Python, he sometimes ends up coding R syntaxes in Python and vice-versa. It\u0026rsquo;s like he\u0026rsquo;s thinking in two languages. He likes list comprehensions and f-strings but misses the magrittr pipe.\nBut I guess it\u0026rsquo;s not too different from how he thinks everything. Some parts are in his mother tongue Hindi while other parts are in English. He prefers English nouns over Hindi nouns, but Hindi verbs over English. He prefers English adjectives over Hindi adjectives but Hindi conjunctions over English. Pretty weird.\nSometimes people don\u0026rsquo;t believe his mother tongue is not English. Especially native English speakers. Just like he has difficulty explaining Pythonistas that he likes R, though he can speak R as well as Python.\nThe story doesn\u0026rsquo;t end. He built something better with every subsequent iteration of improved computing technology that saved him some keystrokes. He learnt and iteratively improved his English while forgetting Hindi due to a lack of practice.\nHe took on small projects to ensure he didn\u0026rsquo;t forget how to write Hindi. He published an article in the University Hindi Magazine as an undergrad. He had to use Google\u0026rsquo;s Hindi Input Tool \u0026mdash; his spelling mistakes would\u0026rsquo;ve only multiplied in these years. Writing by hand was not a good idea.\nIt took him four hours to write a two-page article. His last fifty Google searches were translating an English word to Hindi.\nThese days, he\u0026rsquo;s enjoying generative AI. It\u0026rsquo;s like he\u0026rsquo;s found an obedient slave that\u0026rsquo;s actually intelligent. R requires fewer words than C++, and ChatGPT can simply talk.\nThese newer systems don\u0026rsquo;t need him to talk in any one language. He can talk in any language he likes. GPT will connect the dots.\nWhere would the languages take him next?\nNabh Sparsh Diptam stands for \u0026ldquo;Touch the Sky with Glory\u0026rdquo;. He knew he wanted to be an Air Force pilot. He failed a flight simulation test for which can\u0026rsquo;t retry. https://afcat.cdac.in/AFCAT/motto.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nIf Deathnote\u0026rsquo;s notebook could have rules, this could certainly have rules.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nApple used to have multiple security touch points in place. Gatekeeper would require extensive permissions (chown anyone?). Windows had a more lax attitude. Things have changed now and both are pretty secure. I don\u0026rsquo;t think our hero would be able to pull off something like this today.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThe Zen of Python is a collection of19 guiding principles that influence the design of Python. The principles were written by software engineer Tim Peters; he wanted Guido van Rossum, the creator of Python, to add a 20th principle. However, this never happened, so the current number stands at 19.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/languages/","summary":"A ten-year-old kid was beginning his high school journey when he had to learn a third language. He was already learning two languages: his mother tongue Hindi and the common-speak English. But neither of them had prepared him for Sanskrit.","title":"From Bits to Words: A Tale Computing and Communication Languages"},{"content":"Between Dec 15 and 25 of 2022 I attended a meditation course called Vipassana. Vipassana is a Pali1 word that means \u0026ldquo;seeing things as they are\u0026rdquo;. The course promised me to teach how to have a clear awareness of exactly what is happening as it happens. It is a form of \u0026ldquo;mindfulness meditation\u0026rdquo;.\nWhat is Vipassana? It is an ancient Indian technique for calming your mind and was discovered as early as 2000 BC, when it was described in Rig Veda. Over time, people added to the method or removed parts from it, and it eventually disappeared into void. Common additions were focussing on a symbol, a god, or a chant. Any addition to something this pure only makes it impure.\nAround 600 BC, Buddha rediscovered the technique during a period of intense meditation and used it to attain enlightenment. On the night he attained enlightenment, he was practicing Adhisthan (sitting of self-determination), where he determined he won\u0026rsquo;t move until he attains enlightenment. Learning the benefits of the technique, he then taught Vipassana to his disciples and it was passed down through the generations.\nOver time, the practice of Vipassana faded away in India but was sustained by a small group in Myanmar led by Sayagya U Ba Khin.2 It was revitalized by a Burmese-Indian teacher named S.N. Goenka in the 20th century.\nThe Elephant An elephant is a powerful creature. Weighing over 6000 kg (14,000 pounds), they can easily crush humans with a single foot when they go wild. However, once we gain control over them, we can make them work for us. They can lift heavy logs and arrange them for us.\nOur mind is similar. It is very powerful. It can be used constructively when in control of us, or destructively when we let it control us.\nThe Art of Living Buddha taught that the ultimate peace can be attained by being equanimous and realizing the temporary nature of reality. Attachments to material possessions, desires, and relationships causes suffering, and that true happiness and peace can only be achieved by letting go of these attachments and accepting the impermanent nature of all things. By practicing mindfulness and detachment, we can cultivate a state of equanimity that allows us to remain calm and peaceful in the face of life\u0026rsquo;s challenges.\nBeing equanimous to the outcomes means not being attached to the results of our actions, accepting whatever happens with a peaceful mind. During Vipassana, I became aware of the sensations of my body and how they changed from moment to moment. This experience allowed me to realize the temporary nature of all sensations, including emotions.\nFor example, I may feel happy in one moment and then feel sadness in the next, but both emotions come and go, just like physical sensations. By accepting this impermanence and not being attached to any specific outcome, I was able to cultivate a more peaceful and equanimous mindset.\nS.N. Goenka\u0026rsquo;s Story S.N. Goenka came from a wealthy Marwari family whose ancestors had relocated from India to Myanmar for business opportunities. Despite his success, Goenka suffered from frequent migraines and other mental health issues. To manage his pain, Goenka\u0026rsquo;s doctor prescribed a regular dose of morphine, but the situation only worsened over time. Desperate for a solution, Goenka turned to religion and read various Indian texts, including the Bhagwat Geeta, which had long recited since childhood but to no avail.\nOne day, Goenka came across a Vipassana camp run by Sayagya U Ba Khin. Intrigued by the practice, Goenka decided to attend the camp and was immediately cured of his migraines and other health issues. He was so struck by the transformative power of Vipassana that he became a lifelong practitioner.\nA few years later, Goenka\u0026rsquo;s parents fell ill, both mentally and physically, and were losing control of their bodies and memories. They were living in India, so Goenka returned home to teach them the practice of Vipassana. Some other villagers also joined in and they, too, experienced the benefits of the practice.\nThis was the beginning of the modern Vipassana movement. As word of its effectiveness spread, more people started to seek out Goenka\u0026rsquo;s teachings. Today, Vipassana is practiced by millions of people all over the world and is widely recognized as one of the most powerful mindfulness meditation techniques.\nMy Experience of Vipassana Vipassana, as taught by Buddha, is a practice that is devoid of religious or spiritual overtones. During the first three days, I focused on being mindful of my own breath, paying attention to the sensations in the area above my upper lip and nose. This helped me to improve my ability to concentrate and focus.\nOn the fourth day, I was introduced to the practice of Vipassana. Using the skills I developed in the first three days, I started to scan my body from head to toe, becoming aware of sensations and emotions as they arose. Over the next few days, I worked on increasing my ability to scan my body and recognizing the impermanence of sensations and emotions.\nI become happy, angry, or sad as my mind jumped from meditation to a past memory or a future desire. I was eagerly looking forward to eating Paani-puri. I noticed how my body sensed thouse memories and desires. I saw those sensations come and go. I realised their nature of temporary visits.\nBuddha said that your mind will remember how it felt even though the original stimulus would go away. Every sensation is temporary. But the resulting desire is lasting/permanent.\nThis desire would cause cravings. Those cravings would lead to consumption and even deeper desires.\nBy the fifth day, I learned how to be indifferent to all emotions and understand that my happiness and sadness are the result of my attachment to sensations. The rest of the program was focused on continued practice and letting go of known desires. The process can be intense, but it has helped me to gain a deeper understanding of my mind and the impermanence of emotions.\nRest of days are mostly practice. For me, those days were instrumental. On day 7-8, I kind of felt like I was gonna boil with all the heat I felt during the meditation. Their explanation is that it\u0026rsquo;s your mind getting rid of known desires.\nDay-by-day Summary This is for my own note. I would recommend you to visit Dhamma\u0026rsquo;s website and schedule a session with them. I am not qualified to teach Vipassana.\nNobel Silence has to be practiced at all times. Only when the body is quiet can we quieten our mind. Thus, we would not communicate with anyone including via talking, eye contact, etc.\nIf you have questions regarding your practice, you can visit the teachers during their office hours. They also schedule a check-in session every two days in small groups of 6-8 students.\nDay 1\nFocus on the triangle area. Just keep your focus there with no other motives.\nIf you get distracted, no worries. Go back to the triangle. No regrets. Or blaming.\nDay 2\nSee if you can identify which nostril you are breathing in from.\nYou could be breathing in and out from the left nostril, right nostril, or both nostrils.\nDay 3\nRepeat Day 2 instructions.\nI learnt that my nose has three chambers. The top chamber goes into my skull. The middle chamber regulates how much air goes in. The lower chamber has some hair growth.\nDay 4\nVipassana Day\nScan your body from the top of the head to the toes, focusing on every inch of the skin, looking for ANY sensation.\nIt doesn\u0026rsquo;t matter what sensation it is. It could be an itch, it could be a sensation of numbness, it could feel like an ant walking, it could be the touch of clothes on skin.\nDon\u0026rsquo;t imagine sensations if you can\u0026rsquo;t feel any. Wait for some time if you can\u0026rsquo;t detect any sensation, but if you can\u0026rsquo;t feel anything, move on to the next patch on your body.\nWhen you do feel a sensation, don\u0026rsquo;t stop to explore it. Move on to the next patch of skin.\nMy mind blew up the first time I moved my focus from the lower triangle to the top of the head.\nDay 5\nRepeat Day 4 instructions from today onwards.\nMove your attention from the top of the head to the toes of the feet. Then back from the toes of the feet to the top of the head.\nActually, the order doesn\u0026rsquo;t matter. What matters is that every inch of the skin is covered. This order only ensures that.\nDay 6\nSittings of Adhisthan (Self-Determination) start from today.\nIn these three one-hour-long sittings (of the total ten hours of meditation), ensure that you do not move your body at all.\nDon\u0026rsquo;t punish your body by managing the unbearable pain. The objective is to train your mind not to get distracted and move, not self-flagellation.\nDay 7\nInstead of moving your focus inch by inch, move it body part by body part, if you can.\nThe goal is to expand your ability to know.\nIf you are unable to focus on an entire body part at once, don\u0026rsquo;t get disheartened. Try again every few iterations.\nDay 8\nMeditation in Pagoda: If you want, you can meditate in pure silence without distractions in your cell in the Pagoda.\nRepeat focussing on the body, part by part, instead of inch by inch (if you can).\nExpand to focus on more body parts at once. Eventually, the entire body.\nDay 9\nExpand your attention from body parts to the whole body, if possible.\nIf you can scan your whole body at once, do that 1-2 times. Then, return to scanning the body inch by inch, part-by-part, and eventually the whole body again.\nWhen you\u0026rsquo;re able to do all the steps above: after scanning the entire body, focus on the spinal cord/backbone for any sensations. Start from the hippocampus area and go down to the tailbone.\nDay 10\nMaitri Divas\nNoble silence is lifted, and you can communicate with fellow meditators.\nRemember: Meditation is a powerful tool and thus be used for positive outcomes. Therefore, this final step is important.\nAlways end your meditations by telling yourself / chanting: May all beings, living or non-living, visible or invisible, be at peace.\nVipassana Reforms Lives of Prisoners in India Tihar Jail in New Delhi is notorious for its living conditions. The jail is overcrowded with petty as well as ferocious criminals. It has 20,000 prisoners lodged inside as against a capacity of 10,000. Some were locked in for smuggling hard drugs like Cocaine while some were there for murdering three people in five minutes. The jailers (prison guards) were trained in outdated style of valuing punishment over transforming them internally. This would typically result in high rates of rearrests and prisoners do not learn how to live real people in natural environment, and the jail environment is very different from the real world.\nKiran Bedi, the first women Indian Police Service officer, was made Inspector General (IG) of Delhi Prisons in 1993. She introduced many activities to engage and change lives of prisoners there. She permitted regular communication and meetings with family. Quality of food improved drastically. Prisoners were happy but such surface changes would be short-lived.\nThe core at these changes was Vipassana, the long-term transformation tool, which started with a suggestion from a police officer who had managed his own anger issues with this ancient tool. One thousand prisoners participated in a course conducted by Mr. S. N. Goenka in Tihar Jail, New Delhi, in 1994.\nThe impact of the project was profound. Inmates who participated in the Vipassana meditation courses reported feeling more calm, centered, and in control of their emotions. They also reported a reduction in stress, anxiety, and aggression, and an improvement in their relationships with others.\nMoreover, the project helped to reduce the incidence of violence and drug abuse within the prison. Inmates who participated in the Vipassana courses were less likely to engage in fights or other violent behavior, and the overall atmosphere of the prison became more peaceful and harmonious.\nSeeing its positive results the government of India recommended that every prison in the country should organize ten-day Vipassana courses for inmates.\nPrisoners continue to participate in Vipassana courses every month, at a permanent Vipassana center established in Tihar. Thousands of police officers have also attended Vipassana courses, at the meditation center in the Police Academy in New Delhi, and at other centers throughout India.\nHere are some videos on the topic.\nTED Talk Documentary of Tihar Jail where Vipassana was popularised by Kiran Bedi Every moment aware, every moment equanimous. Pali is an old Indian language that was derived from Sanskrit but used by commoners (Sanskrit was mostly used by priests for worship, etc.). Most of Buddha\u0026rsquo;s teachings are in Pali or Prakrit, which is a sister language to Pali.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nIt is an interesting story how Buddhism spread from India to the world. Ashoka the Great - Rise of the Mauryan Empire Documentary is a good start. It also tells the story of King Ashoka the Great, who was ferocious general known for killing millions, but eventually converted to Buddhism and practised non-violence.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/meditation/","summary":"Between Dec 15 and 25 of 2022 I attended a meditation course called Vipassana. Vipassana is a Pali word that means \u0026lsquo;seeing things as they are\u0026rsquo;. The course promised me to teach how to have a clear awareness of exactly what is happening as it happens. It is a form of mindfulness meditation.","title":"What happens when you meditate ten hours a day?"},{"content":"Nestled in the arid desert landscape of Rajasthan, India, the quiet town of Pushkar is a colorful mosaic of culture, spirituality, and history. Known for its captivating charm and mythical allure, Pushkar is an exquisite destination that intertwines the old-world charm with vibrant hues of contemporary life.\nThe Divine Pushkar Lake Pushkar Lake at night.\nAt the heart of Pushkar lies its iconic Pushkar Lake. According to Hindu mythology, the sacred lake was created when Lord Brahma \u0026ndash; the creator in the Hindu trinity \u0026ndash; dropped a lotus flower, and a lake sprang forth from that spot.\nThis enchanting water body is surrounded by 52 bathing ghats, where pilgrims take a dip to wash away their sins and seek blessings. The sight of evening aarti performed on the ghats, with chants echoing in the backdrop, is a spiritual spectacle that I cannot forget.\nThe Singular Brahma Temple Pushkar houses one of the few temples in the world dedicated to Lord Brahma. The Pushkar Brahma Temple, with its distinct red spire and the image of a swan (Brahma\u0026rsquo;s vehicle), stands as a prominent symbol of the town.\nIntricate marble carvings and a sanctum adorned with silver coins mark the temple\u0026rsquo;s unique architectural beauty, rendering it an essential stop on every visitor\u0026rsquo;s journey.\nWhy are there so few Brahma temples, while so many Shiva or Vishu temples (other gods in Hindu trinity)?\nGood question. There are at least three stories.\nLord Brahma married his daughter Lord Saraswati According to the Padma Purana, one of the eighteen Mahāpurāṇas, a significant genre of ancient Indian scriptures, Lord Brahma is said to have married Saraswati, the goddess of knowledge, who was also his daughter, created from his own body.\nThis act was considered inappropriate, even though the laws of mortal relationships don\u0026rsquo;t strictly apply to divine beings. This invoked the wrath of other gods and led to the decree that Brahma would not be worshipped in the earthly realm.\nLord Brahma\u0026rsquo;s wife Gayatri cursed him Another story, also sourced from the Padma Purana, tells that when Lord Brahma was performing a fire sacrifice, his wife Saraswati was late to the event. In order to complete the yagna (ritual), Brahma married Gayatri, a milkmaid, and sat her in Saraswati\u0026rsquo;s place.\nWhen Saraswati finally arrived and found Gayatri in her place, she cursed Brahma saying that he would not be worshipped on Earth.\n(This is the version I remember from reading Puranic stories. According to the temple placard in Pushkar, his wife was \u0026ldquo;Gayatri\u0026rdquo; and the milkmaid remains anonymous.)\nLord Brahma lied to Shiva in trying to prove his superiority Another story goes that once Brahma and Vishnu were engaged in a fierce argument about who was superior. To settle the dispute, Shiva asked both of them to find the end of the universe. Whoever finds it first is superior.\nVishnu transformed into a boar and started running in one direction, while Brahma took the form of a swan and flew in the other direction. Both journeyed for eons but couldn\u0026rsquo;t reach the end.\nAfter a long and futile search, Vishnu conceded and returned, admitting that there\u0026rsquo;s no edge to the universe. However, Brahma, unwilling to accept defeat, encountered a Ketaki flower (Pandanus odorifer) on his journey upwards. He persuaded the flower to lie and testify before Shiva that Brahma had reached the edge where the flower resided.\nWhen Brahma and the Ketaki flower presented their false claim, Shiva, being omniscient, saw through the deceit. He was furious at Brahma for his dishonesty and pronounced a curse that Brahma would not be worshipped on Earth. He also banned the Ketaki flower from being used in any religious rituals.\nThese three stories give us three important lessons: the wise don\u0026rsquo;t fall prey to pleasure, the wise don\u0026rsquo;t haste, and the wise don\u0026rsquo;t lie.\nA side note: Brahma is very highly worshipped in Thailand. I don\u0026rsquo;t know why.\nFlavors of Pushkar Malpua, an Indian sweet dish. My mother used to cook it often but I was never a sweet tooth. This is all Meenal\u0026rsquo;s palette.\nPushkar\u0026rsquo;s food scene is as vibrant and diverse as its culture. From local treats like Malpua (sweet pancake) and Poha (flattened rice dish) to global cuisines in the numerous cafes that line the market, there\u0026rsquo;s something for every palate.\nThe town is particularly famous for its unique Israeli cuisine, brought in by the numerous Israeli tourists and hippies, and now form a delicious part of the local food tapestry.\nA local street in Pushkar. It is a stunning amalgamation of contemporary Indian with modern west. On the left you can see \u0026ldquo;Thela\u0026rdquo; selling fresh fruits and vegetables (and food, snacks, etc.), while on the right you can see stores selling bananas and guitar in the same store! Right in front with the tagline \u0026ldquo;I ♥️ Pushkar\u0026rdquo; is Madam D\u0026rsquo;Souza, another eatery focussing Western-Indian fusion cuisine.\nSince Pushkar is a holy land, Alcohol, cigarettes and non-vegetarian food are not permitted within the city limits. However, it has one of the best Bhaang lassi.\nIt is an uncanny experience walking in the streets of Pushkar with dilapidated buildings and mooing cows, to suddenly enter a 4.7 star rated Pizzeria on Zomato. La Pizzeria is uniquely ethinic with delicious food.\nA Nook for the Art and Craft The bustling markets of Pushkar are a haven for art and craft enthusiasts. From traditional Rajasthani attire, intricate silver jewelry, colorful bangles, to beautifully crafted leather goods, the marketplace is brimming with local artifacts.\nThese vibrant markets also offer a glimpse into the local lifestyle and customs, bringing visitors closer to the heart and soul of Rajasthan.\nTalking to a lady from Washington state, I learnt that she visits Pushkar every year to buy ethnic notebooks, earrings, dresses, etc. These things cost $1-2 in India while they sell at $20-30 in US, at the minimum. Further talks with shopkeepers there revealed the global network they\u0026rsquo;re all connected to \u0026mdash; talking orders on Whatsapp, sending orders via FedEx, and receiving payment via Paypal. Fascinating!\nIt is a must visit! In essence, Pushkar is a fascinating montage of ancient lore, divine spirituality, bustling markets, and culinary delights. It\u0026rsquo;s a town where tranquility and vibrancy coexist. Despite being steeped in tradition and religious significance, it also embraces modernity and diversity, making it a must-visit destination.\n","permalink":"/pushkar/","summary":"This winter I visited Pushkar, home to the iconic Pushkar Lake and one of the few Brahma temples in the world. It is a town where tranquility and vibrancy coexist, where ancient mythology mingles with a modern lifestyle. Explore the flavors of Pushkar, the colorful markets, and learn the stories behind the rarity of Brahma temples.","title":"From Mythology to Modernity: The Dual Faces of Pushkar"},{"content":"TLDR: Twitter is shutting down Revue, the newsletter platform that I use for Next. Thus, I’m migrating to Substack. You shouldn’t need to do anything on your side. BUT: If you can’t find the newsletter, check your spam folder. And please mark this address as ‘not spam.’ If the newsletter isn’t in your spam folder either, you should look in the Promotions tab.\nYesterday I received surprising news: Revue, the newsletter platform owned by Twitter, was shutting down in less than three weeks. I landed on the information by chance. I checked my emails infrequently this holiday season. Why am I salty about this? Because this newsletter runs on it.\nI started writing this more than a year ago. Seeing the #rstats community on Twitter coming up with new articles and reviews led me to create a collection of interesting ones. One evening, I thought, why not share it with the broader community?\nAlmost automatically, the format of five stories, four packages, three jargons, two tweets and one meme came to my mind.\nThe initial interest was pretty high. The newsletter launched with almost a hundred subscribers. Over time, I lost a few and gained a few. Initially, my target audience was R enthusiasts and learners.\nNow, I’ve seen people with all levels of expertise subscribing. The few responses I got from readers encouraged me to continue.\nRevue was great: it gave me wide reach due to its integration with Twitter. It was free with no ads. It had a simple interface for embedding links. With Revue coming to its end of life abruptly, I needed to find a replacement quickly. Substack was a natural choice as now I crave consistency more than reach.\nI’ve migrated my old posts, but a few technical glitches need to be fixed. Some posts’ titles don’t show up well, but that should be fine, functionally speaking. I hope you will understand.\nBroadly speaking, it was a big lesson for me on how private companies hold such enormous sway over years of our work. With a week of notice, they can decide to kill it. Completely. In a week. Welcome to capitalism.\nEven if they do not own the copyright, they control the reach and thus have absolute power.\nThis is another reason to own the space rather than just the content. I thought of hosting the newsletter directly on my website, but managing the subscribers’ list would require more work.\nAnyway, please head over to Substack and let’s talk there. Substack provides comments, so feel free to discuss the posts if you want!\nThere is also an option of subscribers community where I would post irregular prompts that subscribers can engage and comment on.\nHope you have a wonderful year ahead.\n","permalink":"/substack/","summary":"Twitter is shutting down Revue, the newsletter platform that I use for Next. Thus, I’m migrating to Substack. You shouldn’t need to do anything on your side.","title":"Moving Next from Revue to Substack"},{"content":"In my high school, vacations were a treasure. We would eagerly wait for those 52 days of summer break after those long six months of studies to meet family and friends. There would be another 28 days break in autumn, full of festivities like Diwali, Durga Puja and Chat. Either way, we would get excited looking forward to it.\nAs the start date of vacation came closer, there was a unique turn of events that I would not understand until I was in grade 12 \u0026mdash; until I was the school captain.\nThe food consumption in our mess was calculated on a per capita basis. There was a fair understanding of how much a cadet eats in a day. The same \u0026ldquo;per-capita food consumption\u0026rdquo; would be multiplied by 800 times, while accounting for another 25 cadets\u0026rsquo; worth of food for the variability.1\nOne day in grade 12, I was talking with the Mess manager to understand what should be the mess menu for the coming week as some vegetables currently on the menu were in shortage (or eggs, I don\u0026rsquo;t remember). While discussing the alternatives, I realised he had decreased the number of students in his calculation. From 800 students, he was only counting 650 students. That\u0026rsquo;s a pretty stark difference!\nInitially, I thought it was a calculation mistake. How naive I was! When I pointed this out to him, he explained:\nStudents\u0026rsquo; general food consumption decreases as we draw close to a vacation. Instead of eating the whole plate, they would reduce their diet in anticipation of home food. Additionally, they would rather eat outside, in the paid canteen, spending their savings before leaving for home. After all, they would get home food and a full bank as soon as they reached home!\nThis was startling to me. I didn\u0026rsquo;t have the slightest clue! In hindsight, it was obvious. Even I did it. But I definitely didn\u0026rsquo;t reduce my Chicken curry consumption \u0026mdash; I knew that. Curiously, I asked him about Chicken consumption during those weeks.\nChicken consumption doesn\u0026rsquo;t change. I guess that\u0026rsquo;s tasty enough that students do not cut their appetite. We notice an increase in rice consumption during those days. We have to plan for extra rice and curry during those days.\nWow!\nRemembering home and reducing consumption sounds like a childish thing to do. Seven years after my high school graduation, I found myself doing the same.\nIn my high school classroom, we wrote the details about the day at the corner of our blackboard (green board, to be technically correct). What\u0026rsquo;s the date today, what class was this, what\u0026rsquo;s the class strength and how many of us were present, who was the class teacher and what quote of the day, and so on. JK Sinha sir, our class teacher, was particularly nitpicky that we do it regularly and do it right.\nWhen you were close to the vacation start date, there was a new addition: DLTGH, or Days Left to Go Home. The count began from 30 days before the vacation start date. We looked at the countdown with the same excitement as many of us look up to NASA or ISRO launching a new satellite.\nWhen the number was closer to zero, like seven or so, we couldn\u0026rsquo;t begin anything significant. We had no concept of homework in my high school, so that was out of the question anyway. I didn\u0026rsquo;t play a lot. I wouldn\u0026rsquo;t undertake any new personal project. I wouldn\u0026rsquo;t start reading a new book. Everything would be postponed until after vacation, or AV as we would say it.\nToday, I was at Chaiyos. It\u0026rsquo;s a Thai restaurant near my home in Knoxville. I asked for lo mein with spice level 8. I thought it\u0026rsquo;d make me feel homely. But I was wrong.\nI couldn\u0026rsquo;t finish the entire plate though I\u0026rsquo;m sure I could do it any other day. What was different today?\nI was counting DLTGH. Same symptoms. Loss of appetite. No interest in starting something new. Pushing things to a later time.\nOur Headmaster, Wing Commander Shamim Akhtar, with some us appointments, right before the school assembly started.\nFrom left to right: Sarabjeet (Nalanda House Captain), Manoranjan (School Sports Captain), Mayank (School Adjutant), Sudhanshu (School Vice-captain), Wing Commander Shamim Akhtar, IAF (then Squadron Leader, Headmaster sir), me (School Captain), Ehtesham (School Academics Captain) and Pratik (School Co-curricular Captain).\nThe exact numbers vary depending on the present strength of the school.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/dltgh/","summary":"Today, I was at Chaiyos. It\u0026rsquo;s a Thai restaurant near my home in Knoxville. I asked for lo mein with spice level 8. I thought it\u0026rsquo;d make me feel homely. I was wrong.","title":"DLTGH: Days Left to Go Home"},{"content":"You rarely come across a story so powerful that you experience so many different feelings \u0026mdash; at the same time. Ted Chiang\u0026rsquo;s \u0026ldquo;The Truth of Fact, the Truth of Feeling\u0026rdquo; does that. It evokes several strong feelings, one after another, that will leave you soul-searching.\nMemories are tricky. I should know that; I\u0026rsquo;m terrible at remembering things. In interviews, I\u0026rsquo;m never worried about strengths and weaknesses questions. I already know my answer. I\u0026rsquo;m terrible at remembering things, so I take extensive notes and use Reminders extensively.\nBut from what I know, I\u0026rsquo;m not alone in my suffering. Some are too shy to admit, and most are too ignorant to accept. Stories are probably one way to experience this fallibility of memories.\nThis summer, I read the book \u0026ldquo;The Sense of An Ending\u0026rdquo; by Julian Barnes. The narrator, Tony, has a memory problem as well. He warns us, the readers, that he is not to be trusted. He\u0026rsquo;s only telling us the story \u0026ldquo;how he remembers it\u0026rdquo;. It may not be accurate and definitely won\u0026rsquo;t be complete.\nThe storytelling is engaging. I finished the book in about two weeks, thanks to my 40-minute commute to my office. The story\u0026rsquo;s ending drags longer than you\u0026rsquo;d expect, and everyone keeps telling Tony that \u0026ldquo;he doesn\u0026rsquo;t get it and should stop trying\u0026rdquo;. But he\u0026rsquo;s old and persistent. Finally, he gets that his memory is not accurate. His brain skipped essential parts that changed the story\u0026rsquo;s course.\nToday, I finished Ted Chiang\u0026rsquo;s story. I don\u0026rsquo;t remember when I sent it to my Kindle, but it must have been around a month ago. When I started reading it, I had no recollection of what it was \u0026mdash; whether it was an essay or a blog or a story. But I was instantly hooked on it.\nIn a fictional world, a company has developed a device called Remem, which can offer people digital lossless memory forever. Like Black Mirror, you can record everything you see and recall it by sub-vocalising it. You say, \u0026ldquo;the time when I went on my first coffee walk with Meenal\u0026rdquo;, and you get that memory out in the corner of your eye.\nWith this lifelong memory, a lot of disputes are settled instantly. The caseload in courts is small, and whodunnits are not fun anymore.\nIn a parallel story, albeit in a different century, a tribe deals with Europeans\u0026rsquo; newfound interest in their daily matter. Jijingi is young and becomes friends with the Christian missionary, Moseby, who is teaching the village about God. The God. The reason for European prosperity and wealth. Jijingi learns to read and write with Moseby \u0026mdash; the only kid in town to do so.\nLearning how to read and write is not natural. Jijingi has questions about words and why we need spaces between words. Over time, he becomes more mindful of the transitions between words. He learns words, sentences and paragraphs. He learns:\nAnd words were not just the pieces of speaking; they were the pieces of thinking. When you wrote them down, you could grasp your thoughts like bricks in your hands and push them into different arrangements. Writing lets you look at your thoughts in a way you couldn\u0026rsquo;t if you were just talking, and having seen them, you could improve them, make them stronger and more elaborate.\nWhat is the point of writing for him? What is the point of recording their life for the millions of Remem users?\nPerhaps, the answer lies in understanding what memories are. As we have established, memories are not perfect. Recordings \u0026mdash; written or digital \u0026mdash; provide a point of verification. But more important than that, they help in closing the feedback loop. They help in \u0026ldquo;forgive and forget\u0026rdquo;.\nYou would believe near-perfect memory would be a lifesaver but think again.\nWhat might it be like to have a perfect memory? Arguably the individual with the best memory ever documented was Solomon Shereshevskii, who lived in Russia during the first half of the twentieth century. The psychologists who tested him found that he could hear a series of words or numbers once and remember it months or even years later. With no knowledge of Italian, Shereshevskii was able to quote stanzas of The Divine Comedy that had been read to him fifteen years earlier.\nBut having a perfect memory wasn\u0026rsquo;t the blessing one might imagine it to be. Reading a passage of text evoked so many images in Shereshevskii\u0026rsquo;s mind that he often couldn\u0026rsquo;t focus on what it actually said, and his awareness of innumerable specific examples made it difficult for him to understand abstract concepts. At times, he tried to deliberately forget things. He wrote down numbers he no longer wanted to remember on slips of paper and then burnt them, a kind of slash-and-burn approach to clearing out the undergrowth of his mind, but to no avail.\nPsychologists distinguish between semantic memory \u0026mdash; knowledge of general facts \u0026mdash; and episodic memory \u0026mdash; recollection of personal experiences. We\u0026rsquo;ve been using technological supplements for semantic memory ever since the invention of writing: first books, then search engines. By contrast, we\u0026rsquo;ve historically resisted such aids in episodic memory; few people have ever kept as many diaries or photo albums as they did ordinary books.\nWe have to forget a little bit before we can forgive. When we no longer experience the pain as fresh, the insult is easier to forgive, making it less memorable. This psychological feedback loop makes initially infuriating offences seem pardonable in hindsight. Memory logs take this away from us.\nThe narrator mentions a powerful story where he remembers his daughter, Nicole, saying, \u0026ldquo;You\u0026rsquo;re the reason she left! You drove her away! You can leave too, for all I care. I sure as hell would be better off without you.\u0026rdquo; And to demonstrate her point, she stormed out of the house.\nMoseby has a more straightforward reason for why she writes. \u0026ldquo;verba volant, scripta manent\u0026rdquo;. He explains to Jijingi: in Tiv, you would say, \u0026ldquo;spoken words fly away, written words remain\u0026rdquo;. That\u0026rsquo;s why he carries the written sermons and a copy of his bible even though he remembers them. To be correct.\nJijingi doesn\u0026rsquo;t have a good reason why people should write to remember. His language Tiv has two words for describing what\u0026rsquo;s correct.\nOur language has two words for what in your language is called \u0026rsquo;true.\u0026rsquo; There is what\u0026rsquo;s right, mimi, and what\u0026rsquo;s precise, vough. In a dispute, the principals say what they consider right; they speak mimi. The witnesses, however, are sworn to say precisely what happened; they speak vough. When Sabe has heard what happened, he can decide what action is mimi for everyone. But it\u0026rsquo;s not lying if the principals don\u0026rsquo;t speak vough, as long as they speak mimi.\nAs part of his research, the narrator goes through Nicole\u0026rsquo;s memory of him (Remem allows people in the memory to have access to the memory). He finds that it wasn\u0026rsquo;t her who said those hurtful words. It was him. He is shattered to pieces.\nSometime later, Europeans demanded that the tribes be consolidated. With over a hundred tribes, it was difficult to administer rules for them. They ask them to get reorganised into eight sects. Heads of clans meet, Jijingi observes.\nSabe, Jijingi\u0026rsquo;s clan\u0026rsquo;s head, believes they should join a specific clan based on their lineage. Another clan claims the same lineage to the same ancestor. Since they cannot decide the exact lineage, the question remains unsettled.\nJijingi discusses this with the European missionary, Moseby. Moseby takes him to the European capital, which has genealogy records. Jijingi finds that Sabe is wrong. He goes to Sabe, and Sabe politely declines: Europeans trust paper more than people.\nSabe stopped walking and turned to face Jijingi. \u0026ldquo;Questions of kinship cannot be resolved by paper. You\u0026rsquo;re a scribe because Maisho of the Kwande clan warned me about the boys from the mission school. Maisho wouldn\u0026rsquo;t have looked out for us if we didn\u0026rsquo;t share the same father. Your position is proof of how close our clans are, but you forget that. You look to paper to tell you what you should already know, here.\u0026rdquo; Sabe tapped him on his chest. \u0026ldquo;Have you studied paper so much that you\u0026rsquo;ve forgotten what it is to be Tiv?\u0026rdquo;\nJijingi had become invested in getting his matters factually correct or the facts right \u0026mdash; like Europeans. However, he failed to realise the actual purpose those memories served. European assessment was vough; it was exact and precise. It wasn\u0026rsquo;t enough to settle the question. The question of which clan to join had to be right for the community; it had to be mimi.\nWriting is an instrument of technology. Like Remem. When people start logging their thoughts and actions, they take precedence over memory, though it might not be the best in every situation.\nFor a culture that transmits history through oral means, it is easy to revise history to suit the present needs. The idea that historical accounts shouldn\u0026rsquo;t change is a product of literate cultures\u0026rsquo; reverence for the written word. Anthropologists have long recognised that oral cultures understand the past differently. For them, their histories don\u0026rsquo;t need to be accurate so much as they need to validate the community\u0026rsquo;s understanding of itself. So it wouldn\u0026rsquo;t be correct to say that their histories are unreliable; their records do what they need to do.\nAfter all, history is the present\u0026rsquo;s memory of the past. It needs to complete the feedback loop.\nThis short story drove me through many emotions. I didn\u0026rsquo;t know how these two stories would come together. They might not; I\u0026rsquo;m reading Kafka on The Shore, and Murakami\u0026rsquo;s stories do not come together.\n","permalink":"/infallible-memory/","summary":"You rarely come across a story so powerful that you experience so many different feelings — at the same time. Ted Chiang\u0026rsquo;s \u0026lsquo;The Truth of Fact, the Truth of Feeling\u0026rsquo; does that. It evokes several strong feelings, one after another, that will leave you soul-searching.","title":"Infallible Memory: The Truth of Fact, the Truth of Feeling"},{"content":"Imagine a world without coffee \u0026ndash; it\u0026rsquo;s challenging, isn\u0026rsquo;t it? But how exactly did we come to cherish this energizing beverage? It all began around 800 AD when a shepherd in Ethiopia observed an unusual phenomenon. His goats, after nibbling on a certain shrub, seemed particularly alert, foregoing sleep and bleating well into the night.\nIntrigued, he brought this plant to local monks who concocted a drink from its berries. Thus, coffee was born.\nHistorian Wolfgang Schivelbusch, in his book \u0026ldquo;Taste of Paradise,\u0026rdquo; details the transformative effects of coffee in Europe and the Arab world. He highlights two major repercussions of the coffee trend. Firstly, the process of coffee brewing required boiling water, an act that was not customary before its introduction. As we know, boiled water eliminates most pathogens, significantly reducing water-borne diseases and enhancing public health.\nSecondly, coffee\u0026rsquo;s caffeine is known to promote linear thinking and boost productivity, a fact well recognized by coffee and tea aficionados.\nThese factors potentially contributed to the Arabs\u0026rsquo; significant advancements in various fields during the Islamic Golden Age, including science, mathematics, theology, philosophy, and engineering.\nFast forward to the 1650s when caffeine entered European culture. Prior to its introduction, alcohol was a preferred beverage due to its relative safety compared to microbe-laden water. However, the arrival of coffee and tea from Asia marked the beginning of Europe\u0026rsquo;s Renaissance. With the health benefits of boiled water and the mental stimulation from caffeine, the age of enlightenment ensued, and coffee replaced alcoholic drinks as the popular beverage.\nThe Advent of Coffee Breaks At the height of World War 2, a Denver-based necktie manufacturer, Wigwam Weavers, faced a workforce crisis. With many skilled workers drafted for the war, they had to recruit older adults and women who lacked experience in knitting intricate tie patterns. The solution to increased productivity came in an unlikely form \u0026ndash; breaks for coffee, although the term \u0026lsquo;coffee breaks\u0026rsquo; was not coined then. This seemingly simple change resulted in a significant surge in productivity and quality.\nTo dive deeper into the fascinating history of coffee and its effects, watch this video: What Michael Pollan Learned from Quitting Caffeine for 3 Months. Michael Pollan is the author of the book This Is Your Mind on Plants.\nFrom stimulating scientific advancements to enhancing everyday productivity, the journey of coffee is a testament to its enduring allure. So, the next time you sip your coffee, remember its rich and transformative history.\n","permalink":"/history-of-coffee/","summary":"Coffee\u0026rsquo;s captivating history, from its discovery in Ethiopia to its influence on global health and productivity, unveils the transformative impact of this beloved beverage across centuries and continents.","title":"History of Coffee"},{"content":"\nHistory of Coffee Coffee has a funny history of how humans started consuming it. Sometime around 800 AD, one shepherd in Ethiopia wandered around with his goats and sheep. He noticed that his cattle behaved funny when they ate a particular shrub. They didn\u0026rsquo;t sleep for long and were generally excited for the rest of the day. They kept bleating long into the night.\nHe brought the plant to some monks, who made a drink out of it. There \u0026mdash; we had coffee.\nA historian named Wolfgang Schivelbusch, in his book \u0026ldquo;Taste of Paradise\u0026rdquo;, presented the effects of coffee in Europe and the Arab world. Drinking coffee had two direct consequences. First, making coffee (or even tea) requires boiling water. Boiling water to make drinks was not standard at all before the advent of coffee. As you know, boiling water is the surest method to kill most disease-causing germs, including viruses, bacteria and parasites. Suddenly, there was an incredible boost in public health. The incidence of water-borne diseases fell drastically.\nSecond, caffeine in coffee has psychological effects as well. The drug helps in thinking linearly and making them more productive. This point doesn\u0026rsquo;t require further elaboration for most coffee or tea drinkers.\nThis likely played a role in Arabs developing sweeping advancements in science, mathematics, theology, philosophy, and engineering \u0026mdash; commonly referred to Islamic Golden Age. Algebra, geometry, trigonometry, astronomy, optics, biology and more.\nCaffeine enters European culture in the 1650s. Before caffeine became commonplace, people were drunk all the time. Why? It was safer than water. Water had many potential microbes, but fermented drinks were safe to consume. Even kids regularly drank ciders.\nWith coffee and tea from Asia to Europe, Europe saw its renaissance. The advancements were quick. With the public health boost of boiling water, these drinks, clubbed with the psychological benefits of caffeine, led to many developments, like the age of enlightenment. It became the new popular drink, replacing alcohol-based beverages.\nCoffee Breaks A necktie manufacturer named Wigwam Weavers in Denver had lost most of their excellent workers to World War 2. They hired older adults who weren\u0026rsquo;t drafted in the war but weren\u0026rsquo;t skilled at knitting the intricate patterns in ties. Then, they hired women for the job. Women were terrific at the position. However, they could only do it for a few hours.\nThe company managers called them for a meeting and asked: what can we do to make you more efficient? They asked for \u0026ldquo;coffee breaks\u0026rdquo;, though they didn\u0026rsquo;t call it that at the time. Overnight, productivity and quality shot up.1\nRediscovering Coffee Dea reintroduced coffee to me this summer. Of course, I had always enjoyed coffee. It is far better than Chai (चाय), which most Indians are hooked. Around Portland, we went around trying new cafes and their coffees. At the time, I couldn\u0026rsquo;t spot any difference between any coffee.\nEven today, I need help to spot the difference between coffee from different regions. I can identify different roasts better than a coin flip \u0026mdash; light roasts are bitter, while dark roasts are chocolaty.\nThere is so much more to it. Like coffee, blends are cheaper than single-origin coffee but often taste less good. Most cafes, including Starbucks, have only blends. Only some blends are good. But, generally, single-origin coffee tastes better than blends.\nSoon, I wanted to try more types of coffee. I was looking for a new hobby, and making my fancy coffee sounded like an excellent one to pick up.\nOh, by the way, when I say coffee, I mean Latte. I am not an espresso person. This significantly reduces the types of coffee I can try at a cafe, as they generally only have options for drinks espresso.\nCoffee I\u0026rsquo;ve Made On Dea\u0026rsquo;s recommendation, I got a subscription to Trade Coffee. Trade Coffee They send me a coffee every few weeks whenever I\u0026rsquo;m on the verge of finishing the last pack. They asked my preferences before I began, and I rated coffee after I tried them. Then, based on my ratings, they optimise finding the next best coffee from their inventory.\nHere are the coffees I\u0026rsquo;ve tried.\nCafes in Knoxville These are some of the cafes I\u0026rsquo;ve tried in Knoxville.\nCafe Du Monde, New Orleans Their Café au lait might be the only reason why I would visit New Orleans ever again.\nOther Interesting Stuff about Coffee The sensory experience of making coffee Arabica Coffee Bean Varietals You can learn more about it from this video: What Michael Pollan Learned from Quitting Caffeine for 3 Months. Michael Pollen is the author of the book This Is Your Mind on Plants.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/coffee/","summary":"Recently, I picked up a new hobby: coffee. This note chronicles my journey.","title":"Notes on Coffee"},{"content":" Business Context CK Cafe is an Indian food and beverages chain with about 19 outlets in 5 cities. Their outlets are popular “hangout” places for young and old alike. People often go to their stores for meeting their friends, family or just getting their Chai-tea or coffee. Imagine a cafe, basically.\nTheir prices are not low for Indian standards but they aren’t a luxurious store either. They offer about 100 items at their store, though only about 20 generate most revenue.\nTheir two most popular items are the Chai (tea) and Coffee (which they like to call Kaapi). Chai can be of several types, depending on the spice in it. It could have ginger (Adrak) and be called Adrak Chai for example. In the table below, I’m providing some popular food items and their pictures/ details.\nItem Description Picture Adrak Chai / Kadak Chai / Elaichi Chai / Other types of Chai Chai-tea with Ginger / Chai-tea with strong spices / Chai-tea with Cardamom / etc. Kulhad Chai Chai-tea served in earthen pot. Popular in Northern India, especially New Delhi Indian Filter Kaapi Filter Coffee, popular in Southern India Paneer Puff A croissant-like bread filled with Paneer (Indian cottage cheese) Veg Club Sandwich Vegetarian sandwich with grated vegetables, cheese, etc. Maska Bun Bread and butter; commonly eaten with Chai Biryani A slow-cooked rice dish made with Basmati rice, spices and choice of meat or vegetables Data Analysis You can find the dataset and codes on my GitHub.\nLoading Packages and Setting Working Directory Tidyverse for manipulation and visualisation. arules and arulesViz for association rules mining and visualisation. I like the theme theme_clean() from ggthemes package.\nlibrary(tidyverse) ## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ── ## ✔ ggplot2 3.3.6.9000 ✔ purrr 0.3.4 ## ✔ tibble 3.1.7 ✔ dplyr 1.0.9 ## ✔ tidyr 1.2.0 ✔ stringr 1.4.1 ## ✔ readr 2.1.2 ✔ forcats 0.5.1 ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ## ✖ dplyr::filter() masks stats::filter() ## ✖ dplyr::lag() masks stats::lag() library(arules) ## Loading required package: Matrix ## ## Attaching package: 'Matrix' ## The following objects are masked from 'package:tidyr': ## ## expand, pack, unpack ## ## Attaching package: 'arules' ## The following object is masked from 'package:dplyr': ## ## recode ## The following objects are masked from 'package:base': ## ## abbreviate, write library(arulesViz) library(DT) theme_set(ggthemes::theme_clean()) Loading Data You can load the CSV data and then convert it to a list format as required by arules package. It will take about 3 minutes to process.\n# NOT RUN df = read_csv(\u0026#34;CK_data_anon.csv\u0026#34;) %\u0026gt;% janitor::clean_names() df1 = df %\u0026gt;% select(invoice_name, item_name) invoices = unique(df1$invoice_name) all_items = list() for (i in invoices) { l = df1 %\u0026gt;% filter(invoice_name == i) %\u0026gt;% pull(item_name) %\u0026gt;% as.character() all_items = append(all_items, list(l)) } Or, you can directly import the list file I created for you after processing it. Download it here.\ndf = readRDS(\u0026#34;CK_data_anon.RDS\u0026#34;) Getting Ready for Analysis All analysis with association rules has to be done on a list item. See ?transactions for more details.\nConverting the df to transactions file.\ntrans = transactions(df) ## Warning in asMethod(object): removing duplicated items in transactions Let’s see a summary of what we have.\nsummary(trans) ## transactions as itemMatrix in sparse format with ## 56737 rows (elements/itemsets/transactions) and ## 211 columns (items) and a density of 0.00928914 ## ## most frequent items: ## Kadak Chai Water Bottle 500 ML Adrak Chai Indian Filter Kaapi ## 13910 10986 9748 8935 ## Elaichi Chai (Other) ## 3301 64325 ## ## element (itemset/transaction) length distribution: ## sizes ## 1 2 3 4 5 6 7 8 9 10 11 12 13 ## 24890 18374 7980 3315 1361 508 153 87 28 16 11 6 3 ## 17 19 20 30 ## 2 1 1 1 ## ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 1.00 1.00 2.00 1.96 2.00 30.00 ## ## includes extended item information - examples: ## labels ## 1 Aam Panna ## 2 Adrak Chai ## 3 Adrak Chai Full Let’s look at the most frequent items. Note that on the y-axis, we have the Support.\nitemFrequencyPlot(trans,topN = 20) Another way to visualise the data.\nggplot( tibble( Support = sort(itemFrequency(trans, type = \u0026#34;absolute\u0026#34;), decreasing = TRUE), Item = seq_len(ncol(trans)) ), aes(x = Item, y = Support)) + geom_line() You can note that the most popular items are very popular and the rest of the items are not as popular.\nNumber of Possible Associations For this dataset, the number of possible associations is huge. But how much exactly?\n2^ncol(trans) ## [1] 3.291009e+63 Woah.\nFrequent Itemsets Let’s try to find the frequent itemsets.\nits = apriori(trans, parameter=list(target = \u0026#34;frequent\u0026#34;)) ## Apriori ## ## Parameter specification: ## confidence minval smax arem aval originalSupport maxtime support minlen ## NA 0.1 1 none FALSE TRUE 5 0.1 1 ## maxlen target ext ## 10 frequent itemsets TRUE ## ## Algorithmic control: ## filter tree heap memopt load sort verbose ## 0.1 TRUE TRUE FALSE TRUE 2 TRUE ## ## Absolute minimum support count: 5673 ## ## set item appearances ...[0 item(s)] done [0.00s]. ## set transactions ...[211 item(s), 56737 transaction(s)] done [0.00s]. ## sorting and recoding items ... [4 item(s)] done [0.00s]. ## creating transaction tree ... done [0.00s]. ## checking subsets of size 1 2 done [0.00s]. ## sorting transactions ... done [0.00s]. ## writing ... [4 set(s)] done [0.00s]. ## creating S4 object ... done [0.00s]. its ## set of 4 itemsets Support is a parameter that needs to be optimised. To see all parameters that can be optimised, see ?ASparameter.\nThe lower the support parameter, the higher the number of itemsets you can generate. For large datasets, you should start from higher support values and make your way down. In this case, I tried several values and found 0.1 gave me 4 itemsets, 0.01 gave me 52 itemsets, 0.005 gave me 104 itemsets, and 0.001 gave me 440 itemsets.\nIt will be your call to choose the right value of support.\nits = apriori(trans, parameter=list(target = \u0026#34;frequent\u0026#34;, support = 0.001)) ## Apriori ## ## Parameter specification: ## confidence minval smax arem aval originalSupport maxtime support minlen ## NA 0.1 1 none FALSE TRUE 5 0.001 1 ## maxlen target ext ## 10 frequent itemsets TRUE ## ## Algorithmic control: ## filter tree heap memopt load sort verbose ## 0.1 TRUE TRUE FALSE TRUE 2 TRUE ## ## Absolute minimum support count: 56 ## ## set item appearances ...[0 item(s)] done [0.00s]. ## set transactions ...[211 item(s), 56737 transaction(s)] done [0.00s]. ## sorting and recoding items ... [123 item(s)] done [0.00s]. ## creating transaction tree ... done [0.01s]. ## checking subsets of size 1 2 3 4 done [0.00s]. ## sorting transactions ... done [0.01s]. ## writing ... [440 set(s)] done [0.00s]. ## creating S4 object ... done [0.00s]. its ## set of 440 itemsets Let’s see what we find.\nits = sort(its, by = \u0026#34;support\u0026#34;) inspect(head(its, n = 10)) ## items support count ## [1] {Kadak Chai} 0.24516629 13910 ## [2] {Water Bottle 500 ML} 0.19363026 10986 ## [3] {Adrak Chai} 0.17181028 9748 ## [4] {Indian Filter Kaapi} 0.15748101 8935 ## [5] {Elaichi Chai} 0.05818073 3301 ## [6] {Lemon Ice Tea} 0.04642473 2634 ## [7] {Kadak Chai, Water Bottle 500 ML} 0.04379858 2485 ## [8] {Masala Chai} 0.03985054 2261 ## [9] {Paneer Puff} 0.03831715 2174 ## [10] {Extra Cheese Grated} 0.03646650 2069 Let’s see how many items are brought together.\nggplot(tibble(`Itemset Size` = factor(size(its))), aes(`Itemset Size`)) + geom_bar() Most itemsets are of size two, followed by single items.\nLet’s see the most popular “couples”. inspect(its[size(its) == 2]) ## items support count ## [1] {Kadak Chai, ## Water Bottle 500 ML} 0.043798579 2485 ## [2] {Adrak Chai, ## Water Bottle 500 ML} 0.031372825 1780 ## [3] {Indian Filter Kaapi, ## Water Bottle 500 ML} 0.027407159 1555 ## [4] {Indian Filter Kaapi, ## Kadak Chai} 0.025239262 1432 ## [5] {Employee Meal, ## Kadak Chai} 0.018700319 1061 ## [6] {Adrak Chai, ## Extra Elaichi Flavor} 0.017819060 1011 ## [7] {Adrak Chai, ## Kadak Chai} 0.015774539 895 ## [8] {Adrak Chai, ## Indian Filter Kaapi} 0.015439660 876 ## [9] {Kadak Chai, ## Maska Bun} 0.011667871 662 ## [10] {Indian Filter Kaapi Large, ## Water Bottle 500 ML} 0.010451733 593 ## [11] {Elaichi Chai, ## Water Bottle 500 ML} 0.010345982 587 ## [12] {Extra Cheese Grated, ## Water Bottle 500 ML} 0.010099230 573 ## [13] {Adrak Chai, ## Elaichi Chai} 0.009552849 542 ## [14] {Paneer Puff, ## Water Bottle 500 ML} 0.009535224 541 ## [15] {Lemon Ice Tea, ## Water Bottle 500 ML} 0.008336711 473 ## [16] {Elaichi Chai, ## Kadak Chai} 0.007614079 432 ## [17] {Kadak Chai, ## Paneer Puff} 0.007120574 404 ## [18] {Masala Chai, ## Water Bottle 500 ML} 0.007050073 400 ## [19] {Adrak Chai, ## Maska Bun} 0.007014823 398 ## [20] {CK Sandwich, ## Water Bottle 500 ML} 0.006820946 387 ## [21] {Adrak Chai, ## Extra Cheese Grated} 0.006627069 376 ## [22] {Extra Cheese Grated, ## Veg Club} 0.006538943 371 ## [23] {CK Sandwich, ## Extra Cheese Grated} 0.006433192 365 ## [24] {Bana Ke, ## Paneer Puff} 0.005745810 326 ## [25] {Italian Noodles, ## Water Bottle 500 ML} 0.005728184 325 ## [26] {French Fries ‚Äì Piri Piri, ## Water Bottle 500 ML} 0.005534307 314 ## [27] {Exotic Corn Mayo, ## Extra Cheese Grated} 0.005305180 301 ## [28] {Elaichi Chai, ## Indian Filter Kaapi} 0.005199429 295 ## [29] {Adrak Chai, ## Masala Chai} 0.005146553 292 ## [30] {Adrak Chai, ## CK Sandwich} 0.005111303 290 ## [31] {Indian Filter Kaapi, ## Paneer Puff} 0.005111303 290 ## [32] {Exotic Corn Mayo, ## Water Bottle 500 ML} 0.005093678 289 ## [33] {Adrak Chai, ## French Fries ‚Äì Piri Piri} 0.005093678 289 ## [34] {Kadak Chai, ## Lemon Ice Tea} 0.004987927 283 ## [35] {Adrak Chai, ## Lemon Ice Tea} 0.004846925 275 ## [36] {Veg Biryani, ## Water Bottle 500 ML} 0.004794050 272 ## [37] {Cheese Chutney, ## Water Bottle 500 ML} 0.004758799 270 ## [38] {Extra Cheese Grated, ## Nachos with Dip} 0.004741174 269 ## [39] {Indian Filter Kaapi, ## Indian Filter Kaapi Large} 0.004582548 260 ## [40] {Veg Club, ## Water Bottle 500 ML} 0.004582548 260 ## [41] {Adrak Chai, ## Veg Club} 0.004547297 258 ## [42] {Indori Upma, ## Water Bottle 500 ML} 0.004423921 251 ## [43] {Adrak Chai, ## Exotic Corn Mayo} 0.004423921 251 ## [44] {Adrak Chai, ## Paneer Puff} 0.004423921 251 ## [45] {CK Sandwich, ## Kadak Chai} 0.004318170 245 ## [46] {Extra Cheese Grated, ## White Sauce Pasta} 0.004300545 244 ## [47] {Adrak Chai, ## Italian Noodles} 0.004177168 237 ## [48] {Extra Elaichi Flavor, ## Water Bottle 500 ML} 0.004124293 234 ## [49] {Adrak Chai, ## Chilli Garlic Cheese Toast} 0.004089042 232 ## [50] {Maska Bun, ## Water Bottle 500 ML} 0.004089042 232 ## [51] {Adrak Chai, ## Cheese Chutney} 0.004071417 231 ## [52] {Water Bottle 500 ML, ## White Sauce Pasta} 0.004018542 228 ## [53] {Desi Noodle, ## Water Bottle 500 ML} 0.004000917 227 ## [54] {Extra Cheese Grated, ## Kadak Chai} 0.003965666 225 ## [55] {Adrak Chai, ## Small Kulladh} 0.003824665 217 ## [56] {Extra Cheese Grated, ## Indian Filter Kaapi} 0.003807039 216 ## [57] {Chocolate Kaapi, ## Water Bottle 500 ML} 0.003754164 213 ## [58] {Kadak Chai, ## Veg Club} 0.003754164 213 ## [59] {Indori Upma, ## Kadak Chai} 0.003736539 212 ## [60] {Indian Filter Kaapi, ## Masala Chai} 0.003666038 208 ## [61] {French Fries ‚Äì Piri Piri, ## Kadak Chai} 0.003630788 206 ## [62] {French Fries ‚Äì Piri Piri, ## Indian Filter Kaapi} 0.003542662 201 ## [63] {Chilli Garlic Cheese Toast, ## Water Bottle 500 ML} 0.003525037 200 ## [64] {Masala Omlette, ## Water Bottle 500 ML} 0.003454536 196 ## [65] {Extra IFC Decoaction, ## Indian Filter Kaapi} 0.003419285 194 ## [66] {Exotic Corn Mayo, ## Kadak Chai} 0.003278284 186 ## [67] {Kadak Chai, ## Masala Omlette} 0.003225408 183 ## [68] {Indian Filter Kaapi, ## Lemon Ice Tea} 0.003190158 181 ## [69] {Italian Noodles, ## Kadak Chai} 0.003137283 178 ## [70] {Kadak Chai, ## Masala Chai} 0.003137283 178 ## [71] {Adrak Chai, ## Indori Upma} 0.003066782 174 ## [72] {Indian Filter Kaapi, ## Italian Noodles} 0.003066782 174 ## [73] {Cheese Chutney, ## Kadak Chai} 0.003013906 171 ## [74] {Frappe, ## Water Bottle 500 ML} 0.002961031 168 ## [75] {Elaichi Chai, ## Paneer Puff} 0.002961031 168 ## [76] {Egg Sandwich, ## Water Bottle 500 ML} 0.002802404 159 ## [77] {CK Brownie Blast, ## Water Bottle 500 ML} 0.002802404 159 ## [78] {Adrak Chai, ## Masala Omlette} 0.002802404 159 ## [79] {Indian Filter Kaapi, ## Veg Club} 0.002784779 158 ## [80] {Adrak Chai, ## Chocolate Chai} 0.002731903 155 ## [81] {Green Tea, ## Water Bottle 500 ML} 0.002696653 153 ## [82] {Sprouts Sauted, ## Water Bottle 500 ML} 0.002696653 153 ## [83] {Adrak Chai, ## Desi Noodle} 0.002643777 150 ## [84] {Indian Filter Kaapi, ## Maska Bun} 0.002643777 150 ## [85] {Adrak Chai, ## Veg Biryani} 0.002626152 149 ## [86] {Chilli Garlic Cheese Toast, ## Kadak Chai} 0.002590902 147 ## [87] {CK Sandwich, ## Indian Filter Kaapi} 0.002573277 146 ## [88] {Adrak Chai, ## Extra Adrak Flavor} 0.002538026 144 ## [89] {Thandi Kaapi, ## Water Bottle 500 ML} 0.002538026 144 ## [90] {Desi Noodle, ## Extra Cheese Grated} 0.002538026 144 ## [91] {Extra Adrak Flavor, ## Kadak Chai} 0.002485151 141 ## [92] {Adrak Chai, ## White Sauce Pasta} 0.002467526 140 ## [93] {Egg Sandwich, ## Kadak Chai} 0.002449900 139 ## [94] {Adrak Chai, ## French fries - Salted} 0.002432275 138 ## [95] {Egg Sandwich, ## Extra Cheese Grated} 0.002397025 136 ## [96] {Extra Cheese Grated, ## Lemon Ice Tea} 0.002397025 136 ## [97] {Desi Noodle, ## Kadak Chai} 0.002379400 135 ## [98] {Indian Filter Kaapi Large, ## Kadak Chai} 0.002379400 135 ## [99] {Cheese Chutney, ## Extra Cheese Grated} 0.002361775 134 ## [100] {Exotic Corn Mayo, ## Indian Filter Kaapi} 0.002361775 134 ## [101] {French Fries ‚Äì Piri Piri, ## Lemon Ice Tea} 0.002291274 130 ## [102] {Frappe, ## Indian Filter Kaapi} 0.002273649 129 ## [103] {Adrak Chai, ## Paneer Sandwich} 0.002256023 128 ## [104] {Nachos with Dip, ## Water Bottle 500 ML} 0.002256023 128 ## [105] {Kadak Chai, ## Veg Biryani} 0.002238398 127 ## [106] {Paneer Sandwich, ## Water Bottle 500 ML} 0.002220773 126 ## [107] {Masala Lemonade, ## Water Bottle 500 ML} 0.002220773 126 ## [108] {Irani Chai, ## Kadak Chai} 0.002203148 125 ## [109] {Extra Cheese Grated, ## Paneer Sandwich} 0.002185523 124 ## [110] {Adrak Chai, ## Irani Chai} 0.002185523 124 ## [111] {Adrak Chai, ## Frappe} 0.002185523 124 ## [112] {Extra Adrak Flavor, ## Kulladh Chai} 0.002132647 121 ## [113] {Chilli Garlic Cheese Toast, ## Indian Filter Kaapi} 0.002132647 121 ## [114] {Extra Cheese Grated, ## Italian Noodles} 0.002115022 120 ## [115] {Bana Ke, ## Water Bottle 500 ML} 0.002097397 119 ## [116] {Extra Adrak Flavor, ## Maska Bun} 0.002097397 119 ## [117] {Extra Cheese Grated, ## French Fries ‚Äì Piri Piri} 0.002097397 119 ## [118] {Indian Filter Kaapi, ## Indori Upma} 0.002079772 118 ## [119] {Egg Sandwich, ## Indian Filter Kaapi} 0.002062146 117 ## [120] {Baked Samosa, ## Kadak Chai} 0.002044521 116 ## [121] {Adrak Chai, ## Sprouts Sauted} 0.002044521 116 ## [122] {Adrak Chai, ## Egg Sandwich} 0.002026896 115 ## [123] {Indian Filter Kaapi, ## Masala Omlette} 0.002026896 115 ## [124] {Americano, ## Indian Filter Kaapi} 0.001991646 113 ## [125] {Baked Samosa, ## Water Bottle 500 ML} 0.001991646 113 ## [126] {Frappe, ## Kadak Chai} 0.001991646 113 ## [127] {Elaichi Chai, ## Masala Chai} 0.001974020 112 ## [128] {Extra Cheese Grated, ## Extra Elaichi Flavor} 0.001956395 111 ## [129] {Anda Biryani, ## Water Bottle 500 ML} 0.001921145 109 ## [130] {French fries - Salted, ## Kadak Chai} 0.001921145 109 ## [131] {Chocolate Kaapi, ## Extra Cheese Grated} 0.001921145 109 ## [132] {Indian Filter Kaapi, ## Thandi Kaapi} 0.001903520 108 ## [133] {Kadak Chai, ## Thandi Kaapi} 0.001903520 108 ## [134] {Chocolate Kaapi, ## Indian Filter Kaapi} 0.001903520 108 ## [135] {Cheese Chutney, ## Indian Filter Kaapi} 0.001868269 106 ## [136] {Chocolate Chai, ## Water Bottle 500 ML} 0.001850644 105 ## [137] {Italian Noodles, ## Lemon Ice Tea} 0.001850644 105 ## [138] {Adrak Chai, ## Garlic Butter Bread Spread} 0.001833019 104 ## [139] {Extra Cheese Grated, ## Masala Chai} 0.001833019 104 ## [140] {Indian Filter Kaapi, ## Veg Biryani} 0.001815394 103 ## [141] {Adrak Chai, ## Chocolate Kaapi} 0.001815394 103 ## [142] {Elaichi Chai, ## Maska Bun} 0.001815394 103 ## [143] {Burnt Garlic Maggi, ## Water Bottle 500 ML} 0.001780143 101 ## [144] {CK Cheesy Blast Omelette, ## Water Bottle 500 ML} 0.001762518 100 ## [145] {Adrak Chai, ## Thandi Kaapi} 0.001762518 100 ## [146] {Aam Panna, ## Water Bottle 500 ML} 0.001744893 99 ## [147] {Indian Filter Kaapi, ## White Sauce Pasta} 0.001744893 99 ## [148] {Adrak Chai, ## Green Tea} 0.001727268 98 ## [149] {Green Tea, ## Kadak Chai} 0.001727268 98 ## [150] {Oreo Shake, ## Water Bottle 500 ML} 0.001727268 98 ## [151] {Irani Chai, ## Maska Bun} 0.001727268 98 ## [152] {French Fries ‚Äì Piri Piri, ## Masala Chai} 0.001727268 98 ## [153] {Mexican Maggi, ## Water Bottle 500 ML} 0.001709643 97 ## [154] {Chocolate Kaapi, ## Kadak Chai} 0.001709643 97 ## [155] {Kadak Chai, ## White Sauce Pasta} 0.001709643 97 ## [156] {Exotic Corn Mayo, ## Lemon Ice Tea} 0.001709643 97 ## [157] {Irani Chai, ## Water Bottle 500 ML} 0.001692018 96 ## [158] {Chocolate Chai, ## Kadak Chai} 0.001674392 95 ## [159] {Adrak Chai, ## Nachos with Dip} 0.001674392 95 ## [160] {CK Brownie Blast, ## Extra Cheese Grated} 0.001674392 95 ## [161] {Bana Ke, ## Kadak Chai} 0.001639142 93 ## [162] {Extra Elaichi Flavor, ## Indian Filter Kaapi} 0.001639142 93 ## [163] {Masala Chai, ## Maska Bun} 0.001639142 93 ## [164] {French fries - Salted, ## Water Bottle 500 ML} 0.001621517 92 ## [165] {Extra Cheese Grated, ## Frappe} 0.001621517 92 ## [166] {Adrak Chai, ## Indian Filter Kaapi Large} 0.001621517 92 ## [167] {Americano, ## Kadak Chai} 0.001603892 91 ## [168] {CK Sandwich, ## Lemon Ice Tea} 0.001586266 90 ## [169] {anda Ghotala, ## Water Bottle 500 ML} 0.001551016 88 ## [170] {Baked Samosa, ## Bana Ke} 0.001551016 88 ## [171] {Elaichi Chai, ## Extra Cheese Grated} 0.001551016 88 ## [172] {Elaichi Chai, ## French Fries ‚Äì Piri Piri} 0.001533391 87 ## [173] {Elaichi Chai, ## Lemon Ice Tea} 0.001533391 87 ## [174] {Adrak Chai, ## Burnt Garlic Maggi} 0.001515766 86 ## [175] {CK Sandwich, ## Elaichi Chai} 0.001515766 86 ## [176] {Berry Blast, ## Water Bottle 500 ML} 0.001480515 84 ## [177] {Lemon Ice Tea, ## White Sauce Pasta} 0.001480515 84 ## [178] {Indian Filter Kaapi, ## Sprouts Sauted} 0.001462890 83 ## [179] {Indian Filter Kaapi, ## Nachos with Dip} 0.001445265 82 ## [180] {Exotic Corn Mayo, ## French Fries ‚Äì Piri Piri} 0.001445265 82 ## [181] {Vanilla Kaapi, ## Water Bottle 500 ML} 0.001427640 81 ## [182] {Adrak Chai, ## Baked Samosa} 0.001427640 81 ## [183] {Cheese Chutney, ## French Fries ‚Äì Piri Piri} 0.001427640 81 ## [184] {Kadak Chai, ## Small Kulladh} 0.001410015 80 ## [185] {Adrak Chai, ## Mexican Maggi} 0.001410015 80 ## [186] {Desi Noodle, ## Indian Filter Kaapi} 0.001410015 80 ## [187] {CK Sandwich, ## French Fries ‚Äì Piri Piri} 0.001410015 80 ## [188] {Baked Samosa, ## Paneer Puff} 0.001392389 79 ## [189] {Kadak Chai, ## Sprouts Sauted} 0.001392389 79 ## [190] {Frappe, ## Lemon Ice Tea} 0.001392389 79 ## [191] {Chilli Garlic Cheese Toast, ## Elaichi Chai} 0.001392389 79 ## [192] {Chocolate Kaapi, ## Lemon Ice Tea} 0.001374764 78 ## [193] {Kiwi Mint Banana, ## Water Bottle 500 ML} 0.001357139 77 ## [194] {Extra Elaichi Flavor, ## Kadak Chai} 0.001357139 77 ## [195] {Kadak Chai, ## Mexican Bhel Poori} 0.001339514 76 ## [196] {CK Pasta, ## Water Bottle 500 ML} 0.001321889 75 ## [197] {Chilli Garlic Cheese, ## Water Bottle 500 ML} 0.001321889 75 ## [198] {Mexican Bhel Poori, ## Water Bottle 500 ML} 0.001321889 75 ## [199] {Apple Mojito, ## Water Bottle 500 ML} 0.001304264 74 ## [200] {Extra Elaichi Flavor, ## Maska Bun} 0.001304264 74 ## [201] {Lemon Ice Tea, ## Orange Ice Tea} 0.001286638 73 ## [202] {Adrak Chai, ## Mexican Bhel Poori} 0.001286638 73 ## [203] {Chocolate Shake, ## Water Bottle 500 ML} 0.001286638 73 ## [204] {Indori Upma, ## Sprouts Sauted} 0.001286638 73 ## [205] {French Fries ‚Äì Piri Piri, ## Veg Club} 0.001286638 73 ## [206] {Extra IFC Decoaction, ## Water Bottle 500 ML} 0.001269013 72 ## [207] {Americano, ## Water Bottle 500 ML} 0.001269013 72 ## [208] {French fries - Salted, ## Indian Filter Kaapi} 0.001269013 72 ## [209] {Exotic Corn Mayo, ## Italian Noodles} 0.001269013 72 ## [210] {Lemon Ice Tea, ## Veg Club} 0.001269013 72 ## [211] {Garlic Butter Bread Spread, ## Kadak Chai} 0.001251388 71 ## [212] {Lemon Ice Tea, ## Peach Ice Tea} 0.001251388 71 ## [213] {Peach Ice Tea, ## Water Bottle 500 ML} 0.001251388 71 ## [214] {Baked Samosa, ## Indian Filter Kaapi} 0.001251388 71 ## [215] {Cheese Chutney, ## Lemon Ice Tea} 0.001251388 71 ## [216] {CK Cheesy Blast Fries, ## Water Bottle 500 ML} 0.001233763 70 ## [217] {Kadak Chai, ## Masala Lemonade} 0.001233763 70 ## [218] {Frappe, ## French Fries ‚Äì Piri Piri} 0.001233763 70 ## [219] {Cheese Chutney, ## Elaichi Chai} 0.001233763 70 ## [220] {Lemon Ice Tea, ## Masala Chai} 0.001233763 70 ## [221] {Chocolate Kaapi, ## Vanilla Kaapi} 0.001216138 69 ## [222] {Adrak Chai, ## CK Cheesy Blast Omelette} 0.001216138 69 ## [223] {Indian Filter Kaapi, ## Small Kulladh} 0.001216138 69 ## [224] {Extra Cheese Grated, ## Mexican Maggi} 0.001216138 69 ## [225] {Adrak Chai, ## Chilli Garlic Cheese} 0.001216138 69 ## [226] {Kadak Chai, ## Nachos with Dip} 0.001216138 69 ## [227] {Chilli Garlic Cheese Toast, ## Lemon Ice Tea} 0.001216138 69 ## [228] {CK Sandwich, ## Masala Chai} 0.001216138 69 ## [229] {Chocolate Kaapi, ## French Fries ‚Äì Piri Piri} 0.001198512 68 ## [230] {French Fries ‚Äì Piri Piri, ## Italian Noodles} 0.001198512 68 ## [231] {Adrak Chai, ## CK Brownie Blast} 0.001180887 67 ## [232] {Italian Noodles, ## White Sauce Pasta} 0.001180887 67 ## [233] {CK Nimbu Pani, ## Water Bottle 500 ML} 0.001163262 66 ## [234] {Chilli Garlic Cheese, ## Kadak Chai} 0.001163262 66 ## [235] {Lemon Ice Tea, ## Thandi Kaapi} 0.001163262 66 ## [236] {Indori Upma, ## Masala Omlette} 0.001163262 66 ## [237] {Chilli Garlic Cheese Toast, ## Masala Chai} 0.001163262 66 ## [238] {Extra IFC Decoaction, ## Indian Filter Kaapi Large} 0.001145637 65 ## [239] {Masala Chai, ## Small Kulladh} 0.001145637 65 ## [240] {Chocolate Wallnut Brownie, ## Water Bottle 500 ML} 0.001128012 64 ## [241] {Kadak Chai, ## Paneer Sandwich} 0.001128012 64 ## [242] {Lemon Ice Tea, ## Veg Biryani} 0.001128012 64 ## [243] {Lemon Ice Tea, ## Paneer Puff} 0.001128012 64 ## [244] {CK Pasta, ## Extra Cheese Grated} 0.001110387 63 ## [245] {French Fries ‚Äì Piri Piri, ## White Sauce Pasta} 0.001110387 63 ## [246] {Chilli Garlic Cheese Toast, ## Italian Noodles} 0.001110387 63 ## [247] {Chilli Garlic Cheese, ## Extra Cheese Grated} 0.001092761 62 ## [248] {Green Tea, ## Indian Filter Kaapi} 0.001092761 62 ## [249] {Extra Adrak Flavor, ## Masala Chai} 0.001092761 62 ## [250] {Elaichi Chai, ## Extra Adrak Flavor} 0.001092761 62 ## [251] {Lemon Ice Tea, ## Nachos with Dip} 0.001092761 62 ## [252] {Elaichi Chai, ## Masala Omlette} 0.001092761 62 ## [253] {Chilli Garlic Cheese Toast, ## Extra Cheese Grated} 0.001092761 62 ## [254] {Masala Chai, ## Veg Club} 0.001092761 62 ## [255] {Small Kulladh, ## Water Bottle 500 ML} 0.001075136 61 ## [256] {Chocolate Chai, ## Indian Filter Kaapi} 0.001075136 61 ## [257] {Desi Noodle, ## French Fries ‚Äì Piri Piri} 0.001075136 61 ## [258] {Cheese Chutney, ## Veg Club} 0.001075136 61 ## [259] {Elaichi Chai, ## Veg Club} 0.001075136 61 ## [260] {Adrak Chai, ## Green Tea Lemon} 0.001057511 60 ## [261] {Indian Filter Kaapi, ## Paneer Sandwich} 0.001057511 60 ## [262] {Exotic Corn Mayo, ## Masala Chai} 0.001057511 60 ## [263] {Adrak Chai, ## Adrak Chai Full} 0.001039886 59 ## [264] {Green Tea Lemon, ## Water Bottle 500 ML} 0.001039886 59 ## [265] {Chocolate Wallnut Brownie, ## Kadak Chai} 0.001039886 59 ## [266] {Adrak Chai, ## Peach Ice Tea} 0.001039886 59 ## [267] {Adrak Chai, ## Masala Lemonade} 0.001039886 59 ## [268] {Extra Cheese Grated, ## Masala Omlette} 0.001039886 59 ## [269] {Indian Filter Kaapi Large, ## Lemon Ice Tea} 0.001039886 59 ## [270] {Elaichi Chai, ## Exotic Corn Mayo} 0.001039886 59 ## [271] {Masala Chai, ## Paneer Puff} 0.001039886 59 ## [272] {CK Tadka Burger, ## Extra Cheese Slice} 0.001004635 57 ## [273] {Elaichi Chai, ## Indori Upma} 0.001004635 57 ## [274] {Chilli Garlic Cheese Toast, ## French Fries ‚Äì Piri Piri} 0.001004635 57 What items are consumed in groups of three? inspect(its[size(its) == 3]) ## items support count ## [1] {Indian Filter Kaapi, ## Kadak Chai, ## Water Bottle 500 ML} 0.005587183 317 ## [2] {Adrak Chai, ## Kadak Chai, ## Water Bottle 500 ML} 0.004617798 262 ## [3] {Adrak Chai, ## Extra Elaichi Flavor, ## Water Bottle 500 ML} 0.003771789 214 ## [4] {Adrak Chai, ## Indian Filter Kaapi, ## Water Bottle 500 ML} 0.003525037 200 ## [5] {Kadak Chai, ## Paneer Puff, ## Water Bottle 500 ML} 0.002485151 141 ## [6] {Adrak Chai, ## Kadak Chai, ## Maska Bun} 0.002220773 126 ## [7] {Elaichi Chai, ## Kadak Chai, ## Water Bottle 500 ML} 0.002185523 124 ## [8] {Kadak Chai, ## Maska Bun, ## Water Bottle 500 ML} 0.002150272 122 ## [9] {Adrak Chai, ## Extra Cheese Grated, ## Water Bottle 500 ML} 0.002150272 122 ## [10] {CK Sandwich, ## Extra Cheese Grated, ## Water Bottle 500 ML} 0.002115022 120 ## [11] {Extra Adrak Flavor, ## Kadak Chai, ## Maska Bun} 0.001956395 111 ## [12] {Adrak Chai, ## Elaichi Chai, ## Water Bottle 500 ML} 0.001885895 107 ## [13] {Extra Cheese Grated, ## Veg Club, ## Water Bottle 500 ML} 0.001850644 105 ## [14] {Adrak Chai, ## Extra Cheese Grated, ## Extra Elaichi Flavor} 0.001833019 104 ## [15] {Bana Ke, ## Paneer Puff, ## Water Bottle 500 ML} 0.001797769 102 ## [16] {Exotic Corn Mayo, ## Extra Cheese Grated, ## Water Bottle 500 ML} 0.001709643 97 ## [17] {Indian Filter Kaapi, ## Indian Filter Kaapi Large, ## Water Bottle 500 ML} 0.001639142 93 ## [18] {Adrak Chai, ## Maska Bun, ## Water Bottle 500 ML} 0.001603892 91 ## [19] {Adrak Chai, ## CK Sandwich, ## Water Bottle 500 ML} 0.001533391 87 ## [20] {CK Sandwich, ## Kadak Chai, ## Water Bottle 500 ML} 0.001533391 87 ## [21] {Adrak Chai, ## CK Sandwich, ## Extra Cheese Grated} 0.001498141 85 ## [22] {Indian Filter Kaapi Large, ## Kadak Chai, ## Water Bottle 500 ML} 0.001462890 83 ## [23] {Adrak Chai, ## Extra Elaichi Flavor, ## Indian Filter Kaapi} 0.001462890 83 ## [24] {Bana Ke, ## Kadak Chai, ## Paneer Puff} 0.001445265 82 ## [25] {Adrak Chai, ## Indian Filter Kaapi, ## Kadak Chai} 0.001392389 79 ## [26] {Indian Filter Kaapi, ## Paneer Puff, ## Water Bottle 500 ML} 0.001304264 74 ## [27] {Indian Filter Kaapi, ## Kadak Chai, ## Maska Bun} 0.001198512 68 ## [28] {Adrak Chai, ## Italian Noodles, ## Water Bottle 500 ML} 0.001198512 68 ## [29] {Indori Upma, ## Kadak Chai, ## Water Bottle 500 ML} 0.001163262 66 ## [30] {Extra Cheese Grated, ## Water Bottle 500 ML, ## White Sauce Pasta} 0.001163262 66 ## [31] {Adrak Chai, ## Extra Adrak Flavor, ## Kadak Chai} 0.001145637 65 ## [32] {Adrak Chai, ## Cheese Chutney, ## Water Bottle 500 ML} 0.001145637 65 ## [33] {Adrak Chai, ## Extra Cheese Grated, ## Veg Club} 0.001128012 64 ## [34] {Adrak Chai, ## Extra Adrak Flavor, ## Maska Bun} 0.001110387 63 ## [35] {Adrak Chai, ## French Fries ‚Äì Piri Piri, ## Water Bottle 500 ML} 0.001110387 63 ## [36] {Adrak Chai, ## Masala Chai, ## Water Bottle 500 ML} 0.001110387 63 ## [37] {Adrak Chai, ## Elaichi Chai, ## Kadak Chai} 0.001092761 62 ## [38] {Extra Cheese Grated, ## Kadak Chai, ## Water Bottle 500 ML} 0.001092761 62 ## [39] {Adrak Chai, ## Paneer Puff, ## Water Bottle 500 ML} 0.001075136 61 ## [40] {Extra Cheese Grated, ## Indian Filter Kaapi, ## Water Bottle 500 ML} 0.001075136 61 ## [41] {Adrak Chai, ## Veg Club, ## Water Bottle 500 ML} 0.001057511 60 ## [42] {Adrak Chai, ## Chilli Garlic Cheese Toast, ## Water Bottle 500 ML} 0.001004635 57 What items are consumed in groups of four? inspect(its[size(its) \u0026gt; 3]) ## items support count ## [1] {Adrak Chai, ## Extra Adrak Flavor, ## Kadak Chai, ## Maska Bun} 0.001022261 58 What are the business implications of these? Water 500 ml looks like its sold with a lot of items. As a business, consider adding this as a discounted pair? For example, a bottle of water costs $5. If you buy with Chai, it will cost $3. Representing Itemsets Maximal Itemsets In the previously found itemsets, we included the itemsets and their supersets. However, it would not make a lot of business sense to do that.\nFor example, consider {Adrak Chai, Maska Bun, Water Bottle 500 ML} is one itemset. If we include this, should we also include {Adrak Chai, Water Bottle 500 ML}? Probably no.\nThe function ?is.maximal keeps only those itemsets if no proper superset exists for it.\nits_max = its[is.maximal(its)] its_max ## set of 309 itemsets Let’s look at them.\ninspect(head(its_max, by = \u0026#34;support\u0026#34;)) ## items support count ## [1] {Employee Meal, ## Kadak Chai} 0.018700319 1061 ## [2] {Sultan‚Äôs Kaapi} 0.008389587 476 ## [3] {Lemon Ice Tea, ## Water Bottle 500 ML} 0.008336711 473 ## [4] {Orange Slush} 0.006133564 348 ## [5] {Cinnamon Kaapi} 0.005851561 332 ## [6] {Indian Filter Kaapi, ## Kadak Chai, ## Water Bottle 500 ML} 0.005587183 317 Association Rule Mining These rules are to be interpreted as If This Then That (IFTT).\nrules = apriori(trans, parameter = list(support = 0.001, confidence = 0.2)) ## Apriori ## ## Parameter specification: ## confidence minval smax arem aval originalSupport maxtime support minlen ## 0.2 0.1 1 none FALSE TRUE 5 0.001 1 ## maxlen target ext ## 10 rules TRUE ## ## Algorithmic control: ## filter tree heap memopt load sort verbose ## 0.1 TRUE TRUE FALSE TRUE 2 TRUE ## ## Absolute minimum support count: 56 ## ## set item appearances ...[0 item(s)] done [0.00s]. ## set transactions ...[211 item(s), 56737 transaction(s)] done [0.00s]. ## sorting and recoding items ... [123 item(s)] done [0.00s]. ## creating transaction tree ... done [0.01s]. ## checking subsets of size 1 2 3 4 done [0.00s]. ## writing ... [131 rule(s)] done [0.00s]. ## creating S4 object ... done [0.00s]. length(rules) ## [1] 131 inspect(head(rules)) ## lhs rhs support confidence ## [1] {} =\u0026gt; {Kadak Chai} 0.245166294 0.2451663 ## [2] {Kulladh Chai} =\u0026gt; {Extra Adrak Flavor} 0.002132647 0.5845411 ## [3] {Extra Adrak Flavor} =\u0026gt; {Kulladh Chai} 0.002132647 0.2494845 ## [4] {Adrak Chai Full} =\u0026gt; {Adrak Chai} 0.001039886 0.2243346 ## [5] {Extra Cheese Slice} =\u0026gt; {CK Tadka Burger} 0.001004635 0.2968750 ## [6] {Garlic Butter Bread Spread} =\u0026gt; {Adrak Chai} 0.001833019 0.3623693 ## coverage lift count ## [1] 1.000000000 1.000000 13910 ## [2] 0.003648413 68.381662 121 ## [3] 0.008548214 68.381662 121 ## [4] 0.004635423 1.305711 59 ## [5] 0.003384035 48.263028 57 ## [6] 0.005058427 2.109125 104 Let’s see their quality quality(head(rules)) ## support confidence coverage lift count ## 1 0.245166294 0.2451663 1.000000000 1.000000 13910 ## 2 0.002132647 0.5845411 0.003648413 68.381662 121 ## 3 0.002132647 0.2494845 0.008548214 68.381662 121 ## 4 0.001039886 0.2243346 0.004635423 1.305711 59 ## 5 0.001004635 0.2968750 0.003384035 48.263028 57 ## 6 0.001833019 0.3623693 0.005058427 2.109125 104 Rules with highest lift rules = sort(rules, by = \u0026#34;lift\u0026#34;) inspect(head(rules, n = 10)) ## lhs rhs support confidence coverage lift count ## [1] {Kulladh Chai} =\u0026gt; {Extra Adrak Flavor} 0.002132647 0.5845411 0.003648413 68.38166 121 ## [2] {Extra Adrak Flavor} =\u0026gt; {Kulladh Chai} 0.002132647 0.2494845 0.008548214 68.38166 121 ## [3] {Adrak Chai, ## Kadak Chai, ## Maska Bun} =\u0026gt; {Extra Adrak Flavor} 0.001022261 0.4603175 0.002220773 53.84955 58 ## [4] {Extra Cheese Slice} =\u0026gt; {CK Tadka Burger} 0.001004635 0.2968750 0.003384035 48.26303 57 ## [5] {Adrak Chai, ## Extra Adrak Flavor, ## Kadak Chai} =\u0026gt; {Maska Bun} 0.001022261 0.8923077 0.001145637 37.30793 58 ## [6] {Extra Adrak Flavor, ## Kadak Chai} =\u0026gt; {Maska Bun} 0.001956395 0.7872340 0.002485151 32.91474 111 ## [7] {Kadak Chai, ## Paneer Puff} =\u0026gt; {Bana Ke} 0.001445265 0.2029703 0.007120574 28.22531 82 ## [8] {Bana Ke, ## Kadak Chai} =\u0026gt; {Paneer Puff} 0.001445265 0.8817204 0.001639142 23.01112 82 ## [9] {Bana Ke, ## Water Bottle 500 ML} =\u0026gt; {Paneer Puff} 0.001797769 0.8571429 0.002097397 22.36969 102 ## [10] {Bana Ke} =\u0026gt; {Paneer Puff} 0.005745810 0.7990196 0.007191075 20.85279 326 Visualisation You can also visualise the rules you created, thanks to arulesViz package.\nplot(rules) ## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter. Plot with order of the itemset.\nplot(rules, shading = \u0026#34;order\u0026#34;) ## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter. Grouped plot plot(rules, method = \u0026#34;grouped\u0026#34;) Graph plot plot(rules, method = \u0026#34;graph\u0026#34;) ## Warning: Too many rules supplied. Only plotting the best 100 using ## 'lift' (change control parameter max if needed). ## Warning: ggrepel: 6 unlabeled data points (too many overlaps). Consider ## increasing max.overlaps There are too many rules. Let’s retune the parameters for fewer rules.\nrules = apriori(trans, parameter = list(support = 0.001, confidence = 0.4)) ## Apriori ## ## Parameter specification: ## confidence minval smax arem aval originalSupport maxtime support minlen ## 0.4 0.1 1 none FALSE TRUE 5 0.001 1 ## maxlen target ext ## 10 rules TRUE ## ## Algorithmic control: ## filter tree heap memopt load sort verbose ## 0.1 TRUE TRUE FALSE TRUE 2 TRUE ## ## Absolute minimum support count: 56 ## ## set item appearances ...[0 item(s)] done [0.00s]. ## set transactions ...[211 item(s), 56737 transaction(s)] done [0.00s]. ## sorting and recoding items ... [123 item(s)] done [0.00s]. ## creating transaction tree ... done [0.01s]. ## checking subsets of size 1 2 3 4 done [0.00s]. ## writing ... [26 rule(s)] done [0.00s]. ## creating S4 object ... done [0.00s]. plot(rules, method = \u0026#34;graph\u0026#34;) Interactive Table and Visualisation You can also see the rules interactively.\nTable of Rules inspectDT(rules) Plot of Rules plot(rules, engine = \u0026#34;html\u0026#34;) Matrix of Rules plot(rules, method = \u0026#34;matrix\u0026#34;, engine = \u0026#34;html\u0026#34;) Graph of Rules plot(rules, method = \u0026#34;graph\u0026#34;, engine = \u0026#34;html\u0026#34;) Single-shot Analysis You can simply pass the data here to visualise the rules directly.\nruleExplorer(df) Reference A large part of this tutorial follows the book chapter, Association Analysis: Basic Concepts and Algorithms.\nThis was originally presented to MS (Business Analytics) students on November 21, 2022 at the Haslam College of Business, University of Tennessee in Prof Charles Liu’s class on Data Mining. Thanks to Prof Charles for providing me this opportunity and resources to make this class a success.\n","permalink":"/ck-cafe/","summary":"In this lab session, I share how to use apriori algorithm for association mining. The goal is to find useful causal and association rules which can help in designing  promotions for the company. Plus, you get to see what\u0026rsquo;s served at an Indian cafe.","title":"CK Cafe: Using Association Rules to Find Basket of Goods"},{"content":" Population collapse is a theory that says if the growth rates continue to decline the way they are decreasing right now, we would reach population zero. It is a stage where the population neither grows, nor declines. That is, the number of births plus in-migrants equals the number of deaths plus out-migrants. While this may sound cheerful, you may not have considered the pitfalls yet. The smaller size of working adults would result in lower taxes and thus lower funds for welfare. The welfare that old and young desperately need.\nWhile I had learnt about the concept of population collapse in my population studies class. Until YouTube showed me this video.\nOne measure of population collapse, according to @elonmusk at least, is the ratio of baby diapers sale to adult diapers sale.1 Japan is leading all the way while China is closely following it.\nSometimes it also leads to innovative products:\nThe company, she said, took into consideration that Japanese seniors, already petite by American standards, tend to become even more so with age. The average height of a Japanese woman, notes Yamanaka, is 157 centimeters (5 feet, 1 and a half inches), “however, this senior citizen, 50s and 60s, the average height is 152 centimeters, (just under five feet), five centimeters shorter,” she said.\n“So we make products which is more suitable for them — this height. For example refrigerators and washing machines designed for them to easily take things out.”\nBecause of their stature, products designed elsewhere can be inconvenient for a smaller Japanese population, requiring climbing or reaching to access wet clothes in a washing machine easily.\n“In Japan, most people have backache,” Yamanaka said.\nIn rural areas, many Japanese live in houses with two stories. And stairs. “And these people will clean up their house, stairs, holding up vacuum cleaners. And for them, the average weight, four kilo, was very heavy.” Almost nine pounds. So Panasonic created a lighter vacuum.\neven their fashion shows have adult diaper collections now. a collection of 170 adult diapers and smaller pads was launched in Japan. Men and women walked, striking a pose to the tunes of 1980s British and American pop and rock music. pic.twitter.com/6f5c35Q823\n\u0026mdash; Harshvardhan (@harshbutjust) November 8, 2022 A few days ago, I was discussing this topic with a friend. She was concerned that her maid in New Delhi had too many children, beyond what she could support. Over population, especially if with the economically weaker section of population, can get pretty severe. However, I think this is a short-term view. I would prefer a long-term view. If the population growth rate keeps decreasing at its current pace, we can have problems.\nIf you are curious about population declines, I can recommend you some good starting points.\nPopulation decline, Wikipedia\nZero population growth, Wikipedia\nColeman, David, and Robert Rowthorn. “Who’s afraid of population decline? A critical examination of its consequences.” Population and Development Review, 37 (2011): 217-248.\nRanked: The 20 Countries With the Fastest Declining Populations\nIs the sale of baby diapers 👶 increasing faster than adult diapers 🧓? When I decided to look at the data for this, I found mixed results. The ratio of adult diapers to baby diapers sold varies a lot between the countries.\nFor countries with rapidly ageing population like Japan, it is almost approaching 0.6, while for young countries like India and Bangladesh, it is much smaller. See the table below for the sales and ratio calculated for the year 2021.\nThere is a confounding effect of accessibility as well. Adult diapers are not readily available in developing countries very easily. Even though they are available, people may not like wearing them because it makes them feel infantile. This stigma associated with diapers — diapers are for kids not adults — would be playing a role of their low sales in young countries, which generally also happen to be developing countries.\nWith this caveat in mind, let us get down to exploring these numbers as a time-series. The orange bars represent sale of baby diapers in millions of USD, while the purple bars represent sale of incontinence or adult diapers.2\nJapan 🇯🇵 The glaring example is Japan, a country ageing really fast. The sale of baby diapers is growing at 10.2%, while adult diapers are growing at 8.1% — quite close.\nUnited States of America 🇺🇸 Both are growing, but it is clear that adult diapers is growing faster. USA’s absolute sales volume is much higher for a country of 331 million.\nIndia 🇮🇳 India is one of the youngest countries right now. The baby diapers sale is increasing fast but adult diapers? Not so much.\nChina I cannot understand China’s data well. Period.\nRussia I’ve no clue if sales are increasing or decreasing 🤷\nData All data is sourced from Statista. Thanks to the University of Tennessee, I have free access to Statista. Libraries are the best! If you need the data for all years between 2014 to 2027, let me know.\nThe code to generate the plots can be found here.\nIncontinence, if you’re being pedantic.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nFunny story: my first research project in PhD was with a company that manufactured incontinence diapers. If I had learnt about this before, I would have asked them their perspective.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/population-collapse/","summary":"\u003cscript src=\"/population-collapse/index_files/twitter-widget/widgets.js\"\u003e\u003c/script\u003e\n\u003cscript src=\"/population-collapse/index_files/core-js/shim.min.js\"\u003e\u003c/script\u003e\n\u003cscript src=\"/population-collapse/index_files/react/react.min.js\"\u003e\u003c/script\u003e\n\u003cscript src=\"/population-collapse/index_files/react/react-dom.min.js\"\u003e\u003c/script\u003e\n\u003cscript src=\"/population-collapse/index_files/reactwidget/react-tools.js\"\u003e\u003c/script\u003e\n\u003cscript src=\"/population-collapse/index_files/htmlwidgets/htmlwidgets.js\"\u003e\u003c/script\u003e\n\u003cscript src=\"/population-collapse/index_files/reactable-binding/reactable.js\"\u003e\u003c/script\u003e\n\u003cp\u003e\u003cimg loading=\"lazy\" src=\"/population-collapse/images/Screenshot%202022-11-10%20at%208.32.15%20PM.png\"\u003e\u003c/p\u003e\n\u003cp\u003ePopulation collapse is a theory that says if the growth rates continue to decline the way they are decreasing right now, we would reach population zero. It is a stage where the population neither grows, nor declines. That is, the number of births plus in-migrants equals the number of deaths plus out-migrants. While this may sound cheerful, you may not have considered the pitfalls yet. The smaller size of working adults would result in lower taxes and thus lower funds for welfare. The welfare that old and young desperately need.\u003c/p\u003e","title":"Is the world population going to collapse?"},{"content":"This summer was a fantastic experience for me. I spent three months toying with different machine learning models building large-scale forecasts for HP\u0026rsquo;s print business. I went in with hardly a year of experience in Python. I came out with a deep understanding of how Python works, internally and visually. From textbook examples of regression and time-series forecasting, I went to creating forecasts for thousands of products that HP produces worldwide.\nProject Forecasting Demand My internship project was primarily focused on using machine learning to forecast the demand for printers worldwide. Caroline Johnston, my co-intern from the University of Southern California, and I created the models for better demand signal prediction. We developed models with LightGBM and tuned them with FLAML. To keep track of our numerous experiments, we used ML Flow.\nHP manufactures and sells over 10,000 print-related products in over 170 countries. An accurate demand forecast is vital as demand forecasts drive supply planning. If we forecast higher than actuals, there is a cost of overage; if we predict too low, there is a cost of underage. We do not want too much inventory; we do not want too little, either. Besides this primary project, we also worked on two other projects: recording data quality issues and designing holistic metrics.\nRecording Data Quality Issues HP believes Data is an asset (Curtland et al., 2022). In 2021, the company gained the new imperative that values data as an asset and places a premium on data quality, access, and utility. This idea evolved from Data Science practitioners and was presented to the HP board and other audiences, including the 2021 Data Science and Knowledge Discovery Summit, an HP internal summit with 100+ presentations over three days.1\nWe created Python code that identified errors before they were fed into the model and logged them separately into a spreadsheet. The business and IT teams could use the error reports to fix the problems at the source.\nDesigning Holistic Metrics Businesses care about different metrics than machine learning models. For example, the overage cost for HP is often less than that of underage, so planners prefer over- vs under-forecasting. However, the machine learning models\u0026rsquo; loss function is usually symmetric. RMSE, WMAPE, and almost every metric would treat \\(x\\) units more similarly to \\(x\\) units less.\nWe (largely Caroline) also designed holistic metrics to synthesise business and ML accuracy results. We worked on automatic data quality error detection.\nTeam I worked with the Strategic Planning and Modelling team, abbreviated internally as SPaM.2\nThe SPaM team has an extensive experience in supply chain analytics. They have worked on numerous mission-critical projects over the last 40 years, and my task was no different. My manager, Barrett Crane, and my project mentors, Cara Curtland and Jerry Hwang, cared for us to succeed in the internship.3\nShawn Tay, another SPaMster, taught me ways to think about my career. He suggested that I look at my career as a project consultant Currently, I\u0026rsquo;m working on a project with HP where I have to maximise my contribution so that the client would be happy and satisfied. In the process, I should aim to upskill myself continuously. A better consultant would get a better project, making them better consultants. It is a continuous improvement loop.4\nOther than that, I also had the fortune to learn from Chuck VanDam, Frederic Marie and Pedro Neto. They were working on a personal systems part optimisation project that I cannot share a lot about. Chuck\u0026rsquo;s ability to sharpen arguments in powerpoint slides is remarkable.\nSpreadsheet Modelling Workshop The SPaM team in Vancouver also organised an Excel workshop where we learnt interactive spreadsheet modelling. I also got the opportunity to talk about data manipulation in the session!\nInternStellar Award Our internship project got the first runners-up in the technical contribution category at the HP InternStellar Award competition.5 Here is Caroline and me with our final poster at the Interns Poster Fair.\nFun continues\u0026hellip; I thoroughly enjoyed the work. The problem was intellectually stimulating, and while many of our models are beating the current forecast performance, we still have lots of ideas for further model improvement. Therefore, I\u0026rsquo;m continuing the work through the school year, working part-time with them!\nThank you, Meenal and Cara, for reviewing an early draft of this post. Your comments resulted in significant revisions and polished it.\nFrom HP Inc. Advanced Analytics Powers Technology in the Service of Humanity:\nFormalized in 2016, DSKD is composed of more than 3,000 members who hold biweekly knowledge sharing sessions and annual internal summits. The 2021 summit included 114 presentations over three days, culled from more than 500 submitted papers. Topics presented included proactive and predictive services, deep learning, reinforcement learning, data preparation and feature engineering, product improvement, machine learning (ML) and artificial intelligence (AI)-enabled automation, mixed-integer time-phased optimization, business process automation and much more.\n\u0026#160;\u0026#x21a9;\u0026#xfe0e; The acronym was popular before spam emails became a problem. The word wasn\u0026rsquo;t in general lingo back when HP SPaM was founded in 1989.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nEven today, there\u0026rsquo;s rarely a meeting with Jerry where I don\u0026rsquo;t learn something new about Python, Jupyter and ilk.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nI believe this idea comes from a book that either Caroline or Shawn mentioned, but I don\u0026rsquo;t remember the name.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThe winner developed a method that could print on any fiber rather quickly. I\u0026rsquo;m perfectly fine with the second position; he deserved it.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/hp22/","summary":"Forecasting Global Print Demand Using Machine Learning","title":"Supply Chain Analytics at HP Inc."},{"content":"Life is short for learning from your own mistakes; you need to play catch-up with people who tried new things. Most people do not document their learnings. The rare culturati group that notes their understandings in an essay easily trumps the large group, which keeps their learnings to themselves.\nI have read many interesting essays. Some of them stuck with me \u0026mdash; like fingers working with super glue. I revisit them often. When I reread them, I often see myself clinging to an awfully good section. In this live post, I will share some of those good nuggets.\nEverything that needs to be said has already been said. But since no one was listening, everything must be said again.\n\u0026mdash; André Gide\nLife is Short Paul Graham, http://www.paulgraham.com/vb.html\nPaul Graham is one of my favourite essayists. I\u0026rsquo;ve read a lot of his essays. This one in particular frequently makes me rethink about what is important in life and what is not. Eliminate the unnecessary, focus on the required.\nOk, so life actually is short. Does it make any difference to know that?\nIt has for me. It means arguments of the form \u0026ldquo;Life is too short for x\u0026rdquo; have great force. It\u0026rsquo;s not just a figure of speech to say that life is too short for something. It\u0026rsquo;s not just a synonym for annoying. If you find yourself thinking that life is too short for something, you should try to eliminate it if you can.\nWhen I ask myself what I\u0026rsquo;ve found life is too short for, the word that pops into my head is \u0026ldquo;bullshit.\u0026rdquo; I realize that answer is somewhat tautological. It\u0026rsquo;s almost the definition of bullshit that it\u0026rsquo;s the stuff that life is too short for. And yet bullshit does have a distinctive character. There\u0026rsquo;s something fake about it. It\u0026rsquo;s the junk food of experience.\nBut while some amount of bullshit is inevitably forced on you, the bullshit that sneaks into your life by tricking you is no one\u0026rsquo;s fault but your own. And yet the bullshit you choose may be harder to eliminate than the bullshit that\u0026rsquo;s forced on you. Things that lure you into wasting your time have to be really good at tricking you. An example that will be familiar to a lot of people is arguing online. When someone contradicts you, they\u0026rsquo;re in a sense attacking you. Sometimes pretty overtly. Your instinct when attacked is to defend yourself. But like a lot of instincts, this one wasn\u0026rsquo;t designed for the world we now live in. Counterintuitive as it feels, it\u0026rsquo;s better most of the time not to defend yourself. Otherwise these people are literally taking your life.\nThere is no speed limit Derek Sivers, https://sive.rs/kimo\nThe \u0026ldquo;defined\u0026rdquo; pace exists so that anyone could do it. If you are driven and realise there is no speed limit, you can travel faster, hence further in life.\nAfter a one-minute welcome, we were sitting at the piano, analyzing the sheet music for a jazz standard. He was quickly explaining the chords based on the diatonic scale \u0026mdash; how the dissonance of the tri-tone in the 5-chord with the flat-7 is what makes it want to resolve to the 1. Within a minute, he started quizzing me.\n\u0026ldquo;If the 5-chord with the flat-7 has that tri-tone, then so does another flat-7 chord. Which one?\u0026rdquo;\n\u0026ldquo;Uh\u0026hellip; the flat-2 chord?\u0026rdquo;\n\u0026ldquo;Right! So that\u0026rsquo;s a substitute chord. Any flat-7 chord can be substituted with the other flat-7 that shares the same tri-tone. So reharmonize all the chords you can in this chart. Go.\u0026rdquo;\nThe pace was intense, and I loved it. Finally, someone was challenging me \u0026mdash; keeping me in over my head \u0026mdash; encouraging and expecting me to pull myself up quickly. I was learning so fast, it felt like the adrenaline rush you get while playing a video game. He tossed every fact at me and made me prove that I got it.\nDo things, tell people. Carl Lange, http://carl.flax.ie/dothingstellpeople.html\nThese are the only things you need to do to be successful. You can get away with just doing one of the two, but that\u0026rsquo;s rare, and usually someone else is doing the other part for you.\n\u0026hellip;\nThen make something that you can talk about. Make something cool. Something interesting. Spend time on it. Go crazy. Even if it\u0026rsquo;s the least useful thing you\u0026rsquo;ve ever made, if you can talk about it, make it. This part is easy, because you\u0026rsquo;re doing something you think is cool, and interesting, and if it\u0026rsquo;s useless, great, because you won\u0026rsquo;t need to support it much either!\n\u0026hellip;\nYou would not believe how much opportunity is out there for those who do things and tell people. It\u0026rsquo;s how you travel the entreprenurial landscape. You do something interesting and you tell everyone about it.\nYou and Your Research Richard Hamming, http://www.paulgraham.com/hamming.html\nThis is probably the gem of this list, especially at this point of my life when I\u0026rsquo;m trying to find what is a good research. If you\u0026rsquo;re serious about your career, you should check it out. You can also watch the talk here: https://www.youtube.com/1\nLuck Let me start not logically, but psychologically. I find that the major objection is that people think great science is done by luck. It\u0026rsquo;s all a matter of luck. Well, consider Einstein. Note how many different things he did that were good. Was it all luck? Wasn\u0026rsquo;t it a little too repetitive? Consider Shannon. He didn\u0026rsquo;t do just information theory. Several years before, he did some other good things and some which are still locked up in the security of cryptography. He did many good things.\n\u0026hellip;\nSo yes, it is luck. The particular thing you do is luck, but that you do something is not.\nFor example, when I came to Bell Labs, I shared an office for a while with Shannon. At the same time he was doing information theory, I was doing coding theory. It is suspicious that the two of us did it at the same place and at the same time \u0026mdash; it was in the atmosphere. And you can say, \u0026ldquo;Yes, it was luck.\u0026rdquo; On the other hand you can say, \u0026ldquo;But why of all the people in Bell Labs then were those the two who did it?\u0026rdquo; Yes, it is partly luck, and partly it is the prepared mind; but \u0026ldquo;partly\u0026rdquo; is the other thing I\u0026rsquo;m going to talk about.\nArdent Curiousity One of the characteristics you see, and many people have it including great scientists, is that usually when they were young they had independent thoughts and had the courage to pursue them. For example, Einstein, somewhere around 12 or 14, asked himself the question, \u0026ldquo;What would a light wave look like if I went with the velocity of light to look at it?\u0026rdquo;\n\u0026hellip;\nOne of the characteristics of successful scientists is having courage. Once you get your courage up and believe that you can do important problems, then you can. If you think you can\u0026rsquo;t, almost surely you are not going to. Courage is one of the things that Shannon had supremely. You have only to think of his major theorem. He wants to create a method of coding, but he doesn\u0026rsquo;t know what to do so he makes a random code. Then he is stuck. And then he asks the impossible question, \u0026ldquo;What would the average random code do?\u0026rdquo; He then proves that the average code is arbitrarily good, and that therefore there must be at least one good code. Who but a man of infinite courage could have dared to think those thoughts? That is the characteristic of great scientists; they have courage. They will go forward under incredible circumstances; they think and continue to think.\nThe best time to plant a tree was 20 years ago. The second best time is now. Age is another factor which the physicists particularly worry about. They always are saying that you have got to do it when you are young or you will never do it. Einstein did things very early, and all the quantum mechanic fellows were disgustingly young when they did their best work. Most mathematicians, theoretical physicists, and astrophysicists do what we consider their best work when they are young.\n\u0026hellip;\nYou may find yourself as I saw Brattain when he got a Nobel Prize. The day the prize was announced we all assembled in Arnold Auditorium; all three winners got up and made speeches. The third one, Brattain, practically with tears in his eyes, said, \u0026ldquo;I know about this Nobel-Prize effect and I am not going to let it affect me; I am going to remain good old Walter Brattain.\u0026rdquo; Well I said to myself, \u0026ldquo;That is nice.\u0026rdquo; But in a few weeks I saw it was affecting him. Now he could only work on great problems.\nTurning problem on it\u0026rsquo;s head I think that if you look carefully you will see that often the great scientists, by turning the problem around a bit, changed a defect to an asset. For example, many scientists when they found they couldn\u0026rsquo;t do a problem finally began to study why not. They then turned it around the other way and said, \u0026ldquo;But of course, this is what it is\u0026rdquo; and got an important result. So ideal working conditions are very strange. The ones you want aren\u0026rsquo;t always the best ones for you.\nBeing driven I worked for ten years with John Tukey at Bell Labs. He had tremendous drive. One day about three or four years after I joined, I discovered that John Tukey was slightly younger than I was. John was a genius and I clearly was not. Well I went storming into Bode\u0026rsquo;s office and said, \u0026ldquo;How can anybody my age know as much as John Tukey does?\u0026rdquo; He leaned back in his chair, put his hands behind his head, grinned slightly, and said, \u0026ldquo;You would be surprised Hamming, how much you would know if you worked as hard as he did that many years.\u0026rdquo; I simply slunk out of the office!\n\u0026hellip;\nWhat Bode was saying was this: Knowledge and productivity are like compound interest. Given two people of approximately the same ability and one person who works ten percent more than the other, the latter will more than twice outproduce the former. The more you know, the more you learn; the more you learn, the more you can do; the more you can do, the more the opportunity \u0026mdash; it is very much like compound interest. I don\u0026rsquo;t want to give you a rate, but it is a very high rate. Given two people with exactly the same ability, the one person who manages day in and day out to get in one more hour of thinking will be tremendously more productive over a lifetime.\nWhat you know could be wrong There\u0026rsquo;s another trait on the side which I want to talk about; that trait is ambiguity. It took me a while to discover its importance. Most people like to believe something is or is not true. Great scientists tolerate ambiguity very well. They believe the theory enough to go ahead; they doubt it enough to notice the errors and faults so they can step forward and create the new replacement theory. If you believe too much you\u0026rsquo;ll never notice the flaws; if you doubt too much you won\u0026rsquo;t get started. It requires a lovely balance.\n\u0026hellip;\nDarwin writes in his autobiography that he found it necessary to write down every piece of evidence which appeared to contradict his beliefs because otherwise they would disappear from his mind. When you find apparent flaws you\u0026rsquo;ve got to be sensitive and keep track of those things, and keep an eye out for how they can be explained or how the theory can be changed to fit them. Those are often the great contributions.\nCreativity Now again, emotional commitment is not enough. It is a necessary condition apparently. And I think I can tell you the reason why. Everybody who has studied creativity is driven finally to saying, \u0026ldquo;creativity comes out of your subconscious.\u0026rdquo; Somehow, suddenly, there it is. It just appears. Well, we know very little about the subconscious; but one thing you are pretty well aware of is that your dreams also come out of your subconscious.\n\u0026hellip;\nIf you are deeply immersed and committed to a topic, day after day after day, your subconscious has nothing to do but work on your problem. And so you wake up one morning, or on some afternoon, and there\u0026rsquo;s the answer. For those who don\u0026rsquo;t get committed to their current problem, the subconscious goofs off on other things and doesn\u0026rsquo;t produce the big result. So the way to manage yourself is that when you have a real important problem you don\u0026rsquo;t let anything else get the center of your attention \u0026mdash; you keep your thoughts on the problem. Keep your subconscious starved so it has to work on your problem, so you can sleep peacefully and get the answer in the morning, free.\nLunch Over on the other side of the dining hall was a chemistry table. I had worked with one of the fellows, Dave McCall; furthermore he was courting our secretary at the time. I went over and said, \u0026ldquo;Do you mind if I join you?\u0026rdquo; They can\u0026rsquo;t say no, so I started eating with them for a while. And I started asking, \u0026ldquo;What are the important problems of your field?\u0026rdquo; And after a week or so, \u0026ldquo;What important problems are you working on?\u0026rdquo; And after some more time I came in one day and said, \u0026ldquo;If what you are doing is not important, and if you don\u0026rsquo;t think it is going to lead to something important, why are you at Bell Labs working on it?\u0026rdquo; I wasn\u0026rsquo;t welcomed after that; I had to find somebody else to eat with!\nImportant Problems Let me warn you, \u0026ldquo;important problem\u0026rdquo; must be phrased carefully. The three outstanding problems in physics, in a certain sense, were never worked on while I was at Bell Labs. By important I mean guaranteed a Nobel Prize and any sum of money you want to mention. We didn\u0026rsquo;t work on (1) time travel, (2) teleportation, and (3) antigravity. They are not important problems because we do not have an attack. It\u0026rsquo;s not the consequence that makes a problem important, it is that you have a reasonable attack. That is what makes a problem important.\nGreat Thoughts Time Along those lines at some urging from John Tukey and others, I finally adopted what I called \u0026ldquo;Great Thoughts Time.\u0026rdquo; When I went to lunch Friday noon, I would only discuss great thoughts after that. By great thoughts I mean ones like: \u0026ldquo;What will be the role of computers in all of AT\u0026amp;T?\u0026rdquo;, \u0026ldquo;How will computers change science?\u0026rdquo;\nOpen Door Policy Another trait, it took me a while to notice. I noticed the following facts about people who work with the door open or the door closed. I notice that if you have the door to your office closed, you get more work done today and tomorrow, and you are more productive than most. But 10 years later somehow you don\u0026rsquo;t know quite know what problems are worth working on; all the hard work you do is sort of tangential in importance. He who works with the door open gets all kinds of interruptions, but he also occasionally gets clues as to what the world is and what might be important.\nIt ain\u0026rsquo;t what you do, it\u0026rsquo;s the way that you do it. I was doing the required integration by a rather crummy method, to say the least, but I was getting the answer. And I realized that in truth the problem was not just to get the answer; it was to demonstrate for the first time, and beyond question, that I could beat the analog computer on its own ground with a digital machine. I reworked the method of solution, created a theory which was nice and elegant, and changed the way we computed the answer; the results were no different.\nThe published report had an elegant method which was later known for years as \u0026ldquo;Hamming\u0026rsquo;s Method of Integrating Differential Equations.\u0026rdquo; It is somewhat obsolete now, but for a while it was a very good method. By changing the problem slightly, I did important work rather than trivial work.\n\u0026hellip;\nYou should do your job in such a fashion that others can build on top of it, so they will indeed say, \u0026ldquo;Yes, I\u0026rsquo;ve stood on so and so\u0026rsquo;s shoulders and I saw further.\u0026rdquo; The essence of science is cumulative. By changing a problem slightly you can often do great work rather than merely good work.\n\u0026hellip;\nTo end this part, I\u0026rsquo;ll remind you, \u0026ldquo;It is a poor workman who blames his tools \u0026mdash; the good man gets on with the job, given what he\u0026rsquo;s got, and gets the best answer he can.\u0026rdquo;\nSelling There are three things you have to do in selling. You have to learn to write clearly and well so that people will read it, you must learn to give reasonably formal talks, and you also must learn to give informal talks.\nTalks While going to meetings I had already been studying why some papers are remembered and most are not. The technical person wants to give a highly limited technical talk. Most of the time the audience wants a broad general talk and wants much more survey and background than the speaker is willing to give. As a result, many talks are ineffective. The speaker names a topic and suddenly plunges into the details he\u0026rsquo;s solved. Few people in the audience may follow. You should paint a general picture to say why it\u0026rsquo;s important, and then slowly give a sketch of what was done. Then a larger number of people will say, \u0026ldquo;Yes, Joe has done that,\u0026rdquo; or \u0026ldquo;Mary has done that; I really see where it is; yes, Mary really gave a good talk; I understand what Mary has done.\u0026rdquo;\nWorth it? Well I now come down to the topic, \u0026ldquo;Is the effort to be a great scientist worth it?\u0026rdquo; To answer this, you must ask people. When you get beyond their modesty, most people will say, \u0026ldquo;Yes, doing really first-class work, and knowing it, is as good as wine, women and song put together,\u0026rdquo; or if it\u0026rsquo;s a woman she says, \u0026ldquo;It is as good as wine, men and song put together.\u0026rdquo;\nWhy people fail? Drive and Commitment Well, one of the reasons is drive and commitment. The people who do great work with less ability but who are committed to it, get more done that those who have great skill and dabble in it, who work during the day and go home and do other things and come back and work the next day. They don\u0026rsquo;t have the deep commitment that is apparently necessary for really first-class work. They turn out lots of good work, but we were talking, remember, about first-class work.\nPersonality Defects Good scientists will fight the system rather than learn to work with the system and take advantage of all the system has to offer. It has a lot, if you learn how to use it. It takes patience, but you can learn how to use the system pretty well, and you can learn how to get around it. After all, if you want a decision \u0026lsquo;No\u0026rsquo;, you just go to your boss and get a \u0026lsquo;No\u0026rsquo; easy. If you want to do something, don\u0026rsquo;t ask, do it. Present him with an accomplished fact. Don\u0026rsquo;t give him a chance to tell you \u0026lsquo;No\u0026rsquo;.\n\u0026hellip;\nBy taking the trouble to tell jokes to the secretaries and being a little friendly, I got superb secretarial help.\nEgo Assertion You should dress according to the expectations of the audience spoken to. If I am going to give an address at the MIT computer center, I dress with a bolo and an old corduroy jacket or something else. I know enough not to let my clothes, my appearance, my manners get in the way of what I care about. An enormous number of scientists feel they must assert their ego and do their thing their way. They have got to be able to do this, that, or the other thing, and they pay a steady price.\n\u0026hellip;\nOn the other hand, we can\u0026rsquo;t always give in. There are times when a certain amount of rebellion is sensible. I have observed almost all scientists enjoy a certain amount of twitting the system for the sheer love of it.\nExcuses Now self-delusion in humans is very, very common. There are innumerable ways of you changing a thing and kidding yourself and making it look some other way. When you ask, \u0026ldquo;Why didn\u0026rsquo;t you do such and such,\u0026rdquo; the person has a thousand alibis. If you look at the history of science, usually these days there are ten people right there ready, and we pay off for the person who is there first. The other nine fellows say, \u0026ldquo;Well, I had the idea but I didn\u0026rsquo;t do it and so on and so on.\u0026rdquo; There are so many alibis. Why weren\u0026rsquo;t you first? Why didn\u0026rsquo;t you do it right? Don\u0026rsquo;t try an alibi.\nSteve Jobs: Letter to Himself This is a live blog and is expected to be updated frequently.\nThe content in video is slightly different from the essay. Like Hamming says in the video, he has delivered this talk at many places by various names.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/wise/","summary":"\u003cp\u003eLife is short for learning from your own mistakes; you need to play catch-up with people who tried new things. Most people do not document their learnings. The rare culturati group that notes their understandings in an essay easily trumps the large group, which keeps their learnings to themselves.\u003c/p\u003e\n\u003cp\u003eI have read many interesting essays. Some of them stuck with me \u0026mdash; like fingers working with super glue. I revisit them often. When I reread them, I often see myself clinging to an awfully good section. In this live post, I will share some of those good nuggets.\u003c/p\u003e","title":"Bullets of Wisdom"},{"content":" What are Support Vector Machines (SVM)? Support vector machines are supervised learning models that analyse data to find patterns useful in classification and regression. They are versatile: they can identify non-linear relationships, work with discrete and continuous data, and are used for two-class classification, multi-class classification as well as regression. They are remarkable for unifying geometric theory, elegant mathematics, and theoretical guarantees with practical solid use cases.\nThey provide several specific benefits.\nWith the use of Kernel functions, they are highly effective in higher dimensional spaces.\nWhen the number of dimensions is larger than the number of samples, SVM can still be used. However, one has to be careful with the chosen regularisation parameter ($C$ in this article) and the Kernel function.\nOnly a subset of training data (called support vector) is used for prediction. Therefore, retaining all the training information in the memory is unnecessary, so the prediction process doesn’t slow.\nHistory Vladimir Vapnik and colleagues developed the theory for SVM at AT\u0026amp;T Bell Laboratories1 in 1963. They are one of the most robust prediction methods based on a statistical learning framework called VC Theory developed by Vapnik and Chervonenkis.\nProperties of SVM with soft-margin are examples of empirical risk minimization (ERM) algorithm with hinge loss function. SVMs belong to a natural class of algorithms for statistical inference, which also happens to produce really good predictions.\nInference + Prediction = Data Science. What more can you ask?\nVapnik showing off his ERM framework, taking a jibe on superiority to Bayesian statistics. (ERM formula at top.)\nHow do they work? Geometrically, SVM tries to find a linear hyperplane that separates the data into two classes.\nConsider the following example where I’m selecting two species of Iris flowers and plotting their sepal width and sepal length. The colour represents the species.\nlibrary(tidyverse) theme_set(ggthemes::theme_clean()) p = iris |\u0026gt; filter(Species != \u0026#34;versicolor\u0026#34;) |\u0026gt; ggplot(aes(x = Sepal.Length, y = Sepal.Width, colour = Species)) + geom_point() p There seems to be a clear separation between the two species. Can we draw a (straight) line that separates them?\np = p + geom_segment(aes(x = 4, y = 2, xend = 7, yend = 4.5), colour = 4, lty = 2, alpha = 0.7) p Except for one setosa which is misclassified, we got them all right.\nHowever, there are infinitely many other lines possible.\np = p + geom_segment(aes(x = 4.4, y = 2, xend = 6.5, yend = 4.5), colour = 5, lty = 2, alpha = 0.7) + geom_segment(aes(x = 5, y = 2, xend = 6.5, yend = 4), colour = 6, lty = 2, alpha = 0.7) p Any many many more.\nParadox of Choices Since there are so many choices in deciding the best model, we need to define the problem more rigorously.\nWhat would be the “best” hyperplane separating the two classes? One way to visualize this problem is to think about how we could maximize the distance between two classes. Therefore, the best partitioning hyperplane would maximize the distance between the two classes.\nThink again of the classification problem we have.2\niris |\u0026gt; filter(Species != \u0026#34;versicolor\u0026#34;) |\u0026gt; filter(Sepal.Length \u0026gt; 5) |\u0026gt; ggplot(aes(x = Sepal.Length, y = Sepal.Width, colour = Species)) + geom_point() What is a good “margin”? By visual examination, choose between the three options: which one is the best separating hyperplane?\nOption A Option B Option C Mathematically… The middle line of the margin is \\(w'x + b = 0\\) while the top and bottom lines are \\(w'x + b = -1\\) and \\(w'x + b = 1\\).\nFor any unseen point,\n$$ f(x) = \\begin{cases} 1 \u0026amp; \\text{if} \u0026amp; w\u0026rsquo;x+b \\geq 1 \\ -1 \u0026amp; \\text{if} \u0026amp;w\u0026rsquo;x + b \\leq -1 \\end{cases} $$\nThe margin width is \\(\\frac{2}{||w||^2}\\), which has to maximized. This is equivalent to minimizing \\(\\frac{||w||^2}{2}\\), subject to the constraints:\n$$ f(x) = \\begin{cases} 1 \u0026amp; \\text{if} \u0026amp; w\u0026rsquo;x+b \\geq 1 \\ -1 \u0026amp; \\text{if} \u0026amp;w\u0026rsquo;x + b \\leq -1 \\end{cases} $$\nThis is a constrained optimization problem that can be solved via many methods (numerical, quadratic optimization, etc.).\nWhat if they’re not separable? In our dummy example, I removed two points. But that is usually not a good idea. Can you exclude points from your data because they’re hard to classify?3\nThat’s a blunder for two reasons.\nFirst, we want to build a model that works for all data points — including extreme data points. We will not know if a test point is an extreme point. Second, how will you decide which points to remove? If you remove all tough cases, why even use SVM? A simple linear regression can do a reasonably good work forecasting some points. Let’s take a look at a problem when the classes are not perfectly separable.\nThis one “blue” point is being misclassified. Can we do something about it?\nHere come slack variables to rescue… Slack variables ($\\xi$) add a “padding” around the margin, which varies by observation. For data on the wrong side of the margin, the modified objective function’s value is proportional to its distance from the margin.\nThis is called “soft” margin.\nOptimisation Problem $$ \\min L(w) = \\frac{||w||^2}{2} + C\\left( \\sum_{i = 1}^N \\xi_i^k \\right) $$\nsubject to constraints\n$$ f(x_i) = \\begin{cases} 1 \u0026amp; \\text{if} \u0026amp; w\u0026rsquo;x+b \\geq 1 - \\xi_i \\ -1 \u0026amp; \\text{if} \u0026amp;w\u0026rsquo;x + b \\leq -1 + \\xi_i \\end{cases}. $$\nAnother alternative: Non-linear SVM What if the data has a non-linear trend, like the example below? A linear hyperplane does not make sense at all in that case.\nWe can map our features to a new feature space where they are linearly separable. Recall that we usually take the natural logarithm of wealth before using them in linear regression. The concept is similar, except that it is very expansive and works for many cases.\nKernel Functions We can also create non-linear classifiers by applying “kernel trick”. It is a commonly known technique in statistics which converts lower-dimensional functions to higher-dimensional functions. Generally, it is easier to spot clear decision boundaries in higher dimensions.\nThe resulting algorithm is similar, except that a non-linear kernel function replaces every dot product. Then, the algorithm can fit the maximum-margin hyperplane in a transformed feature space.\nNote that the transformation might be non-linear, and the new space can be high dimensional. The classifier will be a linear hyperplane in the new space but might be non-linear in the original input space.\nSome Common Kernels Polynomial Kernel $$ k(x_i, x_j) = (x_i\u0026rsquo; x_j + 1)^d, $$\nwhen \\(d = 1\\), this is linear kernel; \\(d = 2\\), this is quadratic kernel.\nRadial Basis Kernel / Gaussian Kernel $$ k(x_i, x_j) = \\exp(-\\gamma ||x_i - x_j||^2), $$\nfor all \\(\\gamma \u0026gt; 0\\). When \\(\\gamma = 1/2\\sigma^2\\), this is known to have a width \\(\\sigma\\). It is also known as Radial Basis Function (RBF).\nNotes on SVM’s Practical Usage SVM performs best on average and can outperform most other techniques across many important applications. The effectiveness of SVM in practice depends on (a) the choice of kernel, (b) kernel’s parameters, and (c) soft-margin parameter \\(C\\). Gaussian Kernel (or RBF) is a common choice for kernel function. Its \\(\\gamma\\) has to be tuned. Being a statistically-oriented method, the results are stable, reproducible and largely independent of the specific optimisation algorithm. Being a convex optimisation problem, this leads to the global optimum. Computational challenge: solving the optimisation problem has quadratic complexity. While this is not too bad, using Kernel spaces increases the number of features exaggerating the problem multifold. SVM classifier doesn’t provide posterior class probabilities like Logistic regression. It simply classifies the point into a region. Many packages estimate posterior probabilities with cross validation though. Case Study: Classifying Type of Animals at a Zoo In this example, we will try to predict the type of animal given it’s other characteristics using linear SVM, aka vanilla SVM. This is the zoo data from mlbench package.\nLet’s see the data.\ndata(Zoo, package = \u0026#34;mlbench\u0026#34;) Zoo = as_tibble(Zoo) Zoo |\u0026gt; DT::datatable() Correlation Let’s do some descriptive statistics and explore how the data looks. How do different types of animals vary? Can we see a quick correlation?\nlibrary(ggcorrplot) model.matrix(~0+., data = Zoo) |\u0026gt; cor(use=\u0026#34;pairwise.complete.obs\u0026#34;) |\u0026gt; ggcorrplot(show.diag = F, type=\u0026#34;lower\u0026#34;, lab=TRUE, lab_size=2) This can tell us a lot of interesting insights!\nModelling Fitting Model But our time is limited, so jumping to SVM. Recall that train() takes formula as the first input, data as the second input, method as the third input and other training controls.\nlibrary(caret) svmFit = train( type ~., data = Zoo, method = \u0026#34;svmLinear\u0026#34;, trControl = trainControl(method = \u0026#34;cv\u0026#34;, number = 10) ) svmFit ## Support Vector Machines with Linear Kernel ## ## 101 samples ## 16 predictor ## 7 classes: 'mammal', 'bird', 'reptile', 'fish', 'amphibian', 'insect', 'mollusc.et.al' ## ## No pre-processing ## Resampling: Cross-Validated (10 fold) ## Summary of sample sizes: 91, 90, 92, 89, 91, 92, ... ## Resampling results: ## ## Accuracy Kappa ## 0.97 0.9608681 ## ## Tuning parameter 'C' was held constant at a value of 1 Let’s see details about the final model.\n# storing final model svmFinal = svmFit$finalModel svmFinal ## Support Vector Machine object of class \u0026quot;ksvm\u0026quot; ## ## SV type: C-svc (classification) ## parameter : cost C = 1 ## ## Linear (vanilla) kernel function. ## ## Number of Support Vectors : 47 ## ## Objective Function Value : -0.1448 -0.218 -0.1484 -0.1754 -0.0936 -0.1033 -0.297 -0.0819 -0.1556 -0.0907 -0.1135 -0.182 -0.5763 -0.13 -0.1833 -0.118 -0.0474 -0.0823 -0.1236 -0.1481 -0.5666 ## Training error : 0 Predictions I will use predict() to calculate the predictions. I’m predicting on training data, which is not advisable. But it shows how the SVM function works.\n# creating predictions pred = predict(svmFit, newdata = Zoo) pred ## [1] mammal mammal fish mammal mammal ## [6] mammal mammal fish fish mammal ## [11] mammal bird fish mollusc.et.al mollusc.et.al ## [16] mollusc.et.al bird mammal fish mammal ## [21] bird bird mammal bird insect ## [26] amphibian amphibian mammal mammal mammal ## [31] insect mammal mammal bird fish ## [36] mammal mammal bird fish insect ## [41] insect bird insect bird mammal ## [46] mammal mollusc.et.al mammal mammal mammal ## [51] mammal insect amphibian mollusc.et.al mammal ## [56] mammal bird bird bird bird ## [61] fish fish reptile mammal mammal ## [66] mammal mammal mammal mammal mammal ## [71] mammal bird mollusc.et.al fish mammal ## [76] mammal reptile mollusc.et.al bird bird ## [81] reptile mollusc.et.al fish bird mammal ## [86] mollusc.et.al fish bird insect amphibian ## [91] reptile reptile fish mammal mammal ## [96] bird mammal insect mammal mollusc.et.al ## [101] bird ## Levels: mammal bird reptile fish amphibian insect mollusc.et.al Confusion Matrix # confusion matrix table(Zoo$type, pred) ## pred ## mammal bird reptile fish amphibian insect mollusc.et.al ## mammal 41 0 0 0 0 0 0 ## bird 0 20 0 0 0 0 0 ## reptile 0 0 5 0 0 0 0 ## fish 0 0 0 13 0 0 0 ## amphibian 0 0 0 0 4 0 0 ## insect 0 0 0 0 0 8 0 ## mollusc.et.al 0 0 0 0 0 0 10 Accuracy The model has a 100% accuracy on the training data.\n# prediction accuracy sum(Zoo$type==pred)/nrow(Zoo) ## [1] 1 Post-notes This article was originally created for my guest lecture in Prof Charles Liu’s BZAN 542: Data Mining Methods for Business Applications class. The lecture received positive response and interesting questions, which improved this document.\nI would’ve preferred using Tidymodels’ SVM instead of caret. But since the entire class was designed with caret, I’ve followed the convention.\nNow called Nokia Labs.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nYou would notice that I’ve removed two points, just for simplicity.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nTry saying “these points are hard to classify so I’ll ignore them” to your client. Don’t — unless you want to lose your job.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/svm/","summary":"Support vector machines (SVM) are remarkable for the unification of geometric theory, elegant mathematics, theoretical guarantees with strong practical use cases. In this blog post, I demonstrate certain properties of SVM and how to use them with caret package in R.","title":"A Gentle Introduction to using Support Vector Machines for Classification"},{"content":"In our econometrics class few months ago, Prof Luiz showed us a video to demonstrate how difference-in-difference-in-difference works. To measure the impact of some policy (or treatment in economic-speak), we can compare the outcomes before and after the policy.\nDifference in Difference in Difference Let\u0026rsquo;s say McDonalds believes adding cheese to its Deluxe Chicken Sandwich1 would increase sales. One fine day \u0026mdash; April 1, 2022 \u0026mdash; they added cheese to their sandwich in their Knoxville store. In March 2022, they sold 10,000 units. Six months later, they measured sales again. They sold 15,000 units in October 2022. The analyst was ecstatic with the results.\nHowever, the excitement was short. The Nashville manager turned up and said, we sold 12,000 sandwiches in March; we sold 15,000 in October. How can you say this increment of 5,000 sandwiches was because of that extra cheese? Were Nashville customers were cheesing anyway?\nBut come on, Knoxville\u0026rsquo;s sales increased by 5,000 sandwiches. Nashville\u0026rsquo;s sales increased by 3,000 sandwiches. Knoxville\u0026rsquo;s sales increased by 2,000 more than Nashville! I have to attribute it to the extra cheese!\nAssuming both the cities are similar in size, this has to be true. I know Nashville is much larger than Knoxville, but if they were similar, the effect of cheese is clear. This method of analysis is called difference in difference in difference \u0026mdash; a play on three differences we did just now.\nGifting Bicycles to School Girls in Bihar Bihar is one of the poorest states in India. Their over all literacy rate is 70% but there\u0026rsquo;s stark difference between girls (50%) and boys (70%).2 In 2007, the government decided to distribute bicycles to all girls for free in the hope to get higher school enrollment.3 The program was called Mukhyamantri Balika Cycle Yojna (Chief Minister\u0026rsquo;s Programme on Cycle for Girls).\nThe principal of a school surmised it perfectly:\nIf the girl can ride a bicycle, she can come to school with little difficulty. She can help her family in everyday chores. She can take up economic opportunities. She can travel far for extra classes and private tuitions. Most importantly, she can dream herself as an empowered girl who can be an engineer, a doctor, anything.\nTwo professors: Karthik Muralidharan (University of California, San Diego) and Nishith Prakash (University of Connecticut) were hired by the International Growth Center to study the economic impact. Their findings are best described in this video.4\nThe researchers compared the enrollment before and after the free distribution of bicycles. The enrollment had gone up significantly!\nMany could argue that the enrollment was rising anyway. There could be a myriad of reasons why the enrollment went up. There was higher economic growth, better schools, attitude change on education.\nSo, the researchers compared this growth with the growth in enrollment of boys. Almost all the reasons listed above would also be affecting boy\u0026rsquo;s enrollment. That\u0026rsquo;s where difference-in-difference comes in.\nBut this gross comparison also assumes that the growth rate for both girls and boys were going up at the same rate before and after. That\u0026rsquo;s simply not true. Therefore, they compared the rise in enrollment with a neighbour state \u0026mdash; Jharkhand (my home state). Jharkhand and Bihar are very similar to each other culturally. In fact, Jharkhand was part of Bihar until 2000.\nUsing the difference in difference in difference approach, they found that the program was super effective. It increased the enrollment in Bihar by three times more than the enrollment in Jharkhand.\nConclusion This video explains two things: difference in difference approach of analysing systematic changes and the impact of having a little more freedom in our lives. Once the girls had the bicycles, they could travel to other places and help their parents in errands and jobs. They could go to the next city and open their bank accounts.\nAtomic improvements lead to major changes.\nIts my go-to McD product, other than their Latte. Personally, I prefer McD\u0026rsquo;s coffee over any other chain, especially Starbucks (in India). Recently, I started grinding my own coffee beans to make latte, and that\u0026rsquo;s so much better than any coffee sold at Starbucks, McD or like.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nFrom Wikipedia:\nBihar has a total literacy rate of 69.83%. Overall Male and Female literacy rate is 70.32% and 53.57% respectively. Total Rural literacy rate is 43.9%. In rural areas of Bihar, Male and Female literacy rate is 57.1% and 29.6% respectively. Total Urban literacy rate is 71.9%. In urban areas of Bihar, Male and Female literacy rate is 79.9% and 62.6% respectively.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThere was also an option to get ₹2,000 cash to buy a bicycle. Some people preferred their own model over what the government offered.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nIt mildly offends me that the map in the picture shows Pakistan-occupied-Kashmir (PoK) is shown to be part of Pakistan and not India. With a deep breath, I will let it go.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/bihar/","summary":"Bihar is one of the poorest states in India. Their over all literacy rate is 70% but there\u0026rsquo;s stark difference between girls (50%) and boys (70%). In 2007, the government decided to distribute bicycles to all girls for free in the hope to get higher school enrollment. The program was called Mukhyamantri Balika Cycle Yojna (Chief Minister\u0026rsquo;s Programme on Cycle for Girls). The program had astonishing improvements.","title":"How the government of Bihar is changing lives one girl at a time?"},{"content":"The allure of coffee has always been a constant in my life, somewhat like a well-worn book that continually unveils new chapters. I\u0026rsquo;ve not been a fan of Chai (चाय) like most Indians. But the rich aroma of South Indian filter coffee holds a special place in my heart, brewed traditionally and delivering a strong, aromatic decoction. During the pandemic, I also had a short fling with Dalgona coffee.\nThis summer, Dea added a refreshing twist to my coffee narrative. We ventured through the cafes of Portland, tasting various brews and inviting a whole new spectrum of flavors into my coffee experience. Initially, I struggled to discern the distinctive tastes of different coffees. However, gradually, I found myself identifying the subtle bitterness of light roasts and the rich, chocolatey nuances of dark roasts.\nIt became apparent to me that coffee possesses an underestimated complexity. For instance, the difference between coffee blends and single-origin variants. Blends, although budget-friendly, often compromise taste, a fact evident in the offerings of most cafes, including giants like Starbucks. Yet, in a taste showdown, the robust flavors of single-origin coffee tend to prevail.\nIntrigued by this depth, I found myself yearning to delve deeper, brewing my own gourmet coffee. And when I mention \u0026ldquo;coffee\u0026rdquo;, it\u0026rsquo;s lattes that I favor over espresso. While this choice might limit my café exploration, it certainly does not curb the joy of my journey.\nTaking Dea\u0026rsquo;s advice, I subscribed to Trade Coffee. This service delivers a new pack of coffee every few weeks, perfectly timed with my last pack\u0026rsquo;s completion. I rate each coffee I taste, helping them refine their selection and tailor their recommendations to my liking.\nI\u0026rsquo;ve expanded my brewing repertoire with a Moka pot and a Bodum Milk frother, on the suggestion of James Hoffman. These additions have only enhanced my brewing experience at home.\nSo here I am, penning down my coffee experiences, finding a renewed appreciation for the intricate world of coffee. This journey, far from over, has deepened my relationship with this beloved beverage.\nHere\u0026rsquo;s to many more cups, to endless discoveries, and to the simple joy that a well-brewed cup brings! ☕\n","permalink":"/rediscovering-coffee/","summary":"How did I found a new hobby of making coffee and surprising everyone with my methods? By making bad coffee.","title":"Rediscovering Coffee: A Newfound Hobby"},{"content":" Se the discussion notes below or here.\n","permalink":"/up-and-running-with-r-markdown/","summary":"Introduction to R Markdown for MS (Business Analytics) Class of Fall 2022","title":"Up and Running with R Markdown"},{"content":"Watch the slides below or here.\n","permalink":"/using-github-with-rstudio/","summary":"Introduction to using Git \u0026amp; GitHub with RStudio for MS (Business Analytics) Class of Fall 2022","title":"Using GitHub with RStudio"},{"content":"\nPeople tell me I\u0026rsquo;m too optimistic. Maybe. But why not?\nThings are bad. We know that; everyone knows that. But you can choose to look at the positives. Granted, we\u0026rsquo;ll sometimes be cherry-picking. But not always. Life is like a box of chocolates. There will be good things about every negative thing and bad about most positive things. I want to be happy, so I look at the positives.\nBut the situation isn\u0026rsquo;t as dire either. Historian Howard Zinn beautifully explains why we need to be a little more hopeful.\nTo be hopeful in bad times is not just foolishly romantic. It is based on the fact that human history is a history not only of cruelty, but also of compassion, sacrifice, courage, kindness.\nWhat we choose to emphasize in this complex history will determine our lives. If we see only the worst, it destroys our capacity to do something. If we remember those times and places\u0026mdash;and there are so many\u0026mdash;where people have behaved magnificently, this gives us the energy to act, and at least the possibility of sending this spinning top of a world in a different direction.\nAnd if we do act, in however small a way, we don\u0026rsquo;t have to wait for some grand utopian future. The future is an infinite succession of presents, and to live now as we think human beings should live, in defiance of all that is bad around us, is itself a marvelous victory.\nThe crowd is often angry not at how things happen but at the individuals involved. Usually, it is because we already have an opinion on how things should be. We are irritated to see the other way. Our assumptions tell us that may not work, even though we know there are many paths to the same destination.\nIt is maybe worthwhile to understand that they are trying their best. Or maybe, they didn\u0026rsquo;t realised the extent of their actions. Do you think through all your decisions before you take them?\nYou might say they\u0026rsquo;re leaders and they\u0026rsquo;re entrusted with the responsibility to do so. Yes. I\u0026rsquo;m saying, give them the benefit of doubt.\nThere\u0026rsquo;s a small story from The Little Prince, who is on a tour of the world. On the first asteroid that he visits, he meets a strange king \u0026mdash; a benevolent dictator. He has an uncontrollable urge to command everything. However, unlike present-day dictators, he is wise and prudent. When the prince meets him after a long journey, he yawns.\n\u0026ldquo;It is contrary to the etiquette to yawn in the presence of a king\u0026rdquo;, the monarch said to him. \u0026ldquo;I forbid you to do so!\u0026rdquo;\n\u0026ldquo;I can\u0026rsquo;t help it. I can\u0026rsquo;t stop myself\u0026rdquo;, replied the Little Prince, thoroughly embarrassed. \u0026ldquo;I have come on a long journey, and I have had no sleep.\u0026rdquo;\n\u0026ldquo;Ah then\u0026rdquo; the king said. \u0026ldquo;I order you to yawn. It is years since I have seen anyone yawning. Yawns, to me, are objects of curiosity. Come, now! Yawn again! It is an order.\u0026rdquo;\nThe king expects an obedient pupil. He is disobeyed but with respect.\nThe king doesn\u0026rsquo;t realise that he wants a charming subject. Everyone following his benevolent rules quickly gets boring. It isn\u0026rsquo;t that he wouldn\u0026rsquo;t like order in his state, but if everyone follows his rules strictly, then it is disinteresting.\nLater on in the story, he is impressed by the prince\u0026rsquo;s naturality. He offers him the position of Minister of Justice in his court. The little prince declines: \u0026ldquo;there is nobody here to judge\u0026rdquo;.\nOne of the reasons why we do choose to look at the negatives is because we judge the results instead of being curious about the path that lead to it.\nThe final reason why we should look at the positives is because it is genuinely more fun. My friend told me she didn\u0026rsquo;t like the Netflix adaption of Jane Austen\u0026rsquo;s Persuasion. In its attempt to appeal to current generation of viewers, the producers had diluted the schematics of story. Many elements that stood out \u0026mdash; Anne Ellliot\u0026rsquo;s innocence and inability to take a stand for herself \u0026mdash; are far more vivid in the novel than in the movie.\nDialouges have changed too. \u0026ldquo;Now we\u0026rsquo;re worse than exes, we\u0026rsquo;re friends,\u0026rdquo; no one would say that in the nineteenth century. The colour-blind casting takes away authenticity. It is difficult to imagine such a diverse household at that time in history. She\u0026rsquo;s mostly right.\nWhat she does miss is that the movie, nonetheless, is fun. The dialouge about exes and friends made me laugh out loud. It is true the language doesn\u0026rsquo;t fit nineteenth century Britain, but does it have to? The elements of the movie add up for entertainment, not authenticity. Wouldn\u0026rsquo;t we be happier if we enjoyed the movie as it is without having notions of what it should be like? Dakota Johnson certainly pulls it off.\nNext time when you encounter an unexpected situation, cut them some slack. Be a little more optimistic. Maybe they will miss the deadline, but they will still do it. In the long run, that\u0026rsquo;s always a better problem to have.1 Prof Sean is right: \u0026ldquo;Be one standard deviation more positive than the most positive person you know\u0026rdquo;.\nThis idea of missing the deadline but achieving the target comes from Gwynne Shotwell, President and COO of SpaceX.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/optimistic/","summary":"Things are bad. We know that; everyone knows that. But you can choose to look at the positives. Granted, we\u0026rsquo;ll sometimes be cherry-picking. But not always. Life is like a box of chocolates; there will be good things about every negative thing and bad about most positive things. I want to be happy, so I look at the positives.","title":"Being Optimistic"},{"content":" A problem with learning in public is keeping a strong note of how to avoid heresy. Back in the seventeenth century, if you said anything against God, even if the statement were true, you would be penalised. Sometimes it meant death. It didn’t matter if the statement was true. If you don’t believe me, just ask Galileo.\nToday, we seem to be going back to those ideals. The “mass”, which is difficult to identify, would put you on Twitter trials. Twitter — like a true public square — would decide based on majority opinion, not the truth. Like all public justices, these moments are moments of power exertion rather than an attempt to arrive at the truth.\nAnd what is truth anyway? Are Twitter users with no name and face with a million followers ambassadors of truth? Does getting a thousand likes make a statement the truth? In some cases, yes. In others, no. What does it depend on? Heresy.\nWhen a person says an argument is “x-ist”, they implicitly imply that’s the end of the discussion. They usually do not explain why the argument is “x-ist”. Even if they do, they miss the point: is the argument true? In fact, one of the reasons why these labels are used is because they’re means to an end. A device to avoid discussion.\nThese devices have collateral damage. The person in question might lose their jobs, face or both. These “x-ist” blames discounts on everything else that the speaker has done. Somehow, this one (assumed) blunder overtakes all their positive contribution to society.\nThis heresy is also judged differently than other ideas. If I say I have a bad taste, I’ll still live to see another day. But if I say something about Transgender people, I’m suddenly the worst person alive. I will at least get cancelled.\n‘People who menstruate.’ I’m sure there used to be a word for those people. Someone help me out. Wumben? Wimpund? Woomud? Opinion: Creating a more equal post-COVID-19 world for people who menstruate https://t.co/cVpZxG7gaA\n\u0026mdash; J.K. Rowling (@jk_rowling) June 6, 2020 This is no different from seventeenth-century Papal rule or many Arabic countries today. These days, it usually costs people their jobs. The outcome is usually less severe in the short term but far more intense in the long term.\nA crime is a crime is a crime. No matter what good you did, you’ll be in jail if you break a rule. Today, heresy is the same. The cost of having an unacceptable opinion, albeit true, is high.\nThe information age has accelerated how much information is available at our disposal. It has also increased how much exposure I have. What I do gets registered permanently - even the Government of India warns me to think twice before posting something online. This public attitude is certainly a degradation. Having some opinions is not only considered mistaken but guilty - asking for punishment.\nWhy is the centuries-old phenomenon seeing a sudden rebirth?\nGood question. First, let’s understand who are the intolerant people who make a maverick’s life miserable. If there was a 2x2 grid to describe people, one axis could be independent mindedness; the other axis could be aggressiveness of opinions.1\nAggressive people are those who assert their opinions heavily. Passive people would rather listen and be sheep about it. A majority of them are somewhere in the middle of the spectrum. But like all things, the right-end aggressive would be the first to express their opinions. Independent-minded people base their thoughts on currently available facts. Current is fresh: experience is the only valid evidence. Orthodox people are conventional-minded; they value traditions over innovation.\nBeing independent minded is hard. Thinking is not easy, let alone thinking for ourselves. Giving advices isn’t the way out when you have skin in the game. Ralph Waldo Emerson agrees. Here’s a lucid podcast if you’re not in a mood to read.\nPeople in the top-right quadrant are the ones I’m wary of. When I see distant signs of that, I will avoid speaking with them about anything more than what I absolutely have to do. How to identify them? They think the way to change the world is to be judgemental about other people. They like to “call out” on people with utter disregard for everything else they have to say.\nAnother reason why we’re seeing a rebirth of them is something that’s dear to me: the internet.2 There were always intolerant people, no matter how far you go back in history, in any society with a reasonably large population. But transpiring information from one section to the next was hard. Now, there’s Twitter.\nI have a simple rule for interactions online. My limit for textual arguments is three exchanges. If I can’t convince the other person, or they can’t convince me in three notes, I invite them to have a face-to-face conversation.3 We could meet in person, via video call, or a phone call — in that order of preference. The fourth message is, “Sorry, I don’t argue online with more than three messages. Let’s meet in person to take this further”.\nThis rule has saved me countless hours of fighting with keyboard warriors. You wouldn’t believe how common it is for people to back down at this stage. The ease of typing makes people believe they have stronger opinions than they have. To date, only two people have ever taken up the offer to meet in person. In the first case, we agreed we were in value disagreement, which makes us both right in our places. In the second case, we realised we weren’t really in a conflict.\nIn many cases, the stone-pelters are just mean people. There is no dearth of mean people. Meanness isn’t rare. In fact, one of the things the internet has shown us is how mean people can be.4 A few decades ago, only famous people and professional writers got to publish their opinions. Now everyone can, and we can all see the long tail of meanness that had previously been hidden.\nThere’s one important difference between the old and the new wave. The intolerant activists in the seventeenth century came from the right-aligned groups; today, they come from left-aligned groups.\nWhy? Heresy requires purist viewpoints. Back in the 16th century, orthodox viewers supported strict Christian doctrines. Today, most youngsters do not believe in god. As Sadhguru said, heaven has collapsed. Youngsters believe in moral and ideological purity. The right is slowly catching up, though.\nPersonally, what I am worried about is what happens when I get called out. I don’t take my life seriously, period. I’m afraid I’ll likely respond to them on a comical note. Perhaps also with a link to this essay. Hopefully, I don’t end up like Dave Chappelle’s friend Daphene Dorman who committed suicide trying to defend Dave from the Trans community.\nBe curious. Not judgmental.\n- Walt Whitman5\nThis model comes from Paul Graham’s essay: The Four Quadrants of Conformism.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nAnil Dash’s talk on “The Web We Lost” describes this perfectly.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThis is for text messages, usually on WhatsApp. If it were on letters or emails, I might think differently. Indians love arguing. My family would routinely discuss politics, policy, life philosophy and just about anything in our WhatsApp group. When we get together in person, it’s no different. Just bring up one new government policy and you’d see clashes from both ends. Most of it is superficial: we don’t carry our disagreements on politics to heart and happily share dinners and dances later.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nAn example of meanness from the early days of internet is the story of Hunter Moore who used to run WhoIsUp — a revenge porn website. He had the audacity to join a talk show while sharing the stage with women whose lives had been devastated by the website.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThe quote is apparently misattributed. Whatever.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/public-square/","summary":"A problem with learning in public is keeping a strong note of how to avoid heresy. Back in the seventeenth century, if you said anything against God, even if the statement were true, you would be penalised. Sometimes it meant death. It didn\u0026rsquo;t matter if the statement was true. If you don\u0026rsquo;t believe me, just ask Galileo.","title":"Stoned to Death at the Public Square"},{"content":"Recently, I learned a neat trick during my internship at HP.\nSometimes you only need a few changes in the existing script before rerunning it all. For example, while working on a periodically executed project, we needed to run the same set of Jupiter notebooks every month. The code didn\u0026rsquo;t change, barring a few parameters.\nUsually, I recommend keeping the parameters at the beginning, so you notice what you need to change readily. But sometimes, you cannot avoid changes midway through the script.\nHow do you identify all the changes you need manually before hitting \u0026ldquo;restart kernel and run all\u0026rdquo; (in Jupyter Notebooks) or \u0026ldquo;Source\u0026rdquo; (in RStudio)?\nHi Monkey 🐵 Use a Script Monkey in your codebase at all locations where you need to change things manually. All that involves is writing an additional comment saying \u0026ldquo;Script Monkey\u0026rdquo;. Later, search for all monkeys in the script and make the changes. Simple.\n# Script monkey: Add current month and lag df = pd.DataFrame(month = [\u0026#39;2022-02\u0026#39;, \u0026#39;2022-03\u0026#39;, \u0026#39;2022-03\u0026#39;], lag = [1, 2, 3]) Adding a small comment with #Script Monkey will save you hours looking through the codes. Just Cmd + F (⌘ + F in Mac or Ctrl + F in Windows) for \u0026ldquo;monkey\u0026rdquo;, and you will know what to keep track of!\nBeyond Scripts Scripts are only the beginning. Later on, you might need to modify more things. In that case, use Data Monkey, Tuning Monkey, Timing Monkey \u0026ndash; and more!\nData Monkey 🐵 In data analytics or machine learning projects, data modifications are inevitable as new data streams in or data structures evolve.\n# Data monkey: Update data source data = pd.read_csv(\u0026#39;new_data_source.csv\u0026#39;) # Data monkey: Add new features data[\u0026#39;new_feature\u0026#39;] = data[\u0026#39;existing_feature1\u0026#39;] * data[\u0026#39;existing_feature2\u0026#39;] Utilize \u0026ldquo;Data Monkey\u0026rdquo; comments to mark places where data sources or features may need updates. A quick search for \u0026ldquo;Data Monkey\u0026rdquo; will guide you to all data-related modifications at once.\nTuning Monkey 🐵 Parameter tuning is crucial for optimizing model performance.\n# Tuning monkey: Update hyperparameters model = RandomForestClassifier(n_estimators=100, max_depth=5) By marking parameter tuning sections with \u0026ldquo;Tuning Monkey,\u0026rdquo; you can swiftly locate and adjust model parameters, streamlining the tuning process.\nTiming Monkey 🐵 Project timelines often shift, impacting deadlines and schedules.\n# Timing monkey: Update project month project_month = \u0026#39;2023-11-01\u0026#39; Employ \u0026ldquo;Timing Monkey\u0026rdquo; comments to highlight date or time-sensitive code segments, aiding in keeping project timelines accurate and up-to-date.\nEach Monkey variant simplifies managing different project elements, helping maintain a clean, organized, and efficient workflow.\nMonkey will find its way to you.1\nFeatured image credit: Cute monkey vector created by catalyststuff - www.freepik.com\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/script-monkey/","summary":"Use a Script Monkey in your codebase at all locations where you need to change things manually. All that involves is writing an extra comment saying \u0026lsquo;Script Monkey\u0026rsquo;. Later on, search for all monkeys in the script and make the changes. Simple.","title":"Script monkey! 🐒"},{"content":"How much more do men earn doing the same job as women? In this exploration, I will examine if the gender pay gap exists, in what jobs and how much is it. Specifically, this dataset is from United Kingdom. It was part of #tidytuesday event and can be downloaded from this link.\nThis online tool lets you visualise the difference by gender and occupation. If you\u0026rsquo;re feeling brave, try this quiz too.\nLet\u0026rsquo;s begin knitr::opts_chunk$set(collapse = TRUE, out.width = \u0026#34;100%\u0026#34;) library(tidyverse) ## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ── ## ✔ ggplot2 3.3.6.9000 ✔ purrr 0.3.4 ## ✔ tibble 3.1.7 ✔ dplyr 1.0.9 ## ✔ tidyr 1.2.0 ✔ stringr 1.4.0 ## ✔ readr 2.1.2 ✔ forcats 0.5.1 ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ## ✖ dplyr::filter() masks stats::filter() ## ✖ dplyr::lag() masks stats::lag() library(DT) ggthemr::ggthemr(\u0026#39;dust\u0026#39;) paygap_raw = read_csv(\u0026#34;https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-06-28/paygap.csv\u0026#34;) ## Rows: 48711 Columns: 27 ## ── Column specification ──────────────────────────────────────────────────────── ## Delimiter: \u0026#34;,\u0026#34; ## chr (9): employer_name, address, post_code, company_number, sic_codes, com... ## dbl (15): employer_id, diff_mean_hourly_percent, diff_median_hourly_percent... ## lgl (1): submitted_after_the_deadline ## dttm (2): due_date, date_submitted ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. glimpse(paygap_raw) ## Rows: 48,711 ## Columns: 27 ## $ employer_name \u0026lt;chr\u0026gt; \u0026#34;Bryanston School, Incorporated\u0026#34;, \u0026#34;RED BA… ## $ employer_id \u0026lt;dbl\u0026gt; 676, 16879, 17677, 682, 17101, 687, 17484… ## $ address \u0026lt;chr\u0026gt; \u0026#34;Bryanston House, Blandford, Dorset, DT11… ## $ post_code \u0026lt;chr\u0026gt; \u0026#34;DT11 0PX\u0026#34;, \u0026#34;EH6 8NU\u0026#34;, \u0026#34;LS7 1AB\u0026#34;, \u0026#34;TA6 3J… ## $ company_number \u0026lt;chr\u0026gt; \u0026#34;00226143\u0026#34;, \u0026#34;SC016876\u0026#34;, \u0026#34;10530651\u0026#34;, \u0026#34;0672… ## $ sic_codes \u0026lt;chr\u0026gt; \u0026#34;85310\u0026#34;, \u0026#34;47730\u0026#34;, \u0026#34;78300\u0026#34;, \u0026#34;93110\u0026#34;, \u0026#34;5621… ## $ diff_mean_hourly_percent \u0026lt;dbl\u0026gt; 18.0, 2.3, 41.0, -22.0, 13.4, 15.1, 15.0,… ## $ diff_median_hourly_percent \u0026lt;dbl\u0026gt; 28.2, -2.7, 36.0, -34.0, 8.1, 2.8, 0.0, 0… ## $ diff_mean_bonus_percent \u0026lt;dbl\u0026gt; 0.0, 15.0, -69.8, -47.0, 41.4, 77.6, 0.0,… ## $ diff_median_bonus_percent \u0026lt;dbl\u0026gt; 0.0, 37.5, -157.2, -67.0, 43.7, 71.2, 0.0… ## $ male_bonus_percent \u0026lt;dbl\u0026gt; 0.0, 15.6, 50.0, 25.0, 8.7, 5.8, 0.0, 0.0… ## $ female_bonus_percent \u0026lt;dbl\u0026gt; 0.0, 66.7, 73.5, 75.0, 3.2, 4.2, 0.0, 0.0… ## $ male_lower_quartile \u0026lt;dbl\u0026gt; 24.4, 20.3, 0.0, 56.0, 29.1, 42.6, 10.0, … ## $ female_lower_quartile \u0026lt;dbl\u0026gt; 75.6, 79.7, 100.0, 44.0, 70.9, 57.4, 90.0… ## $ male_lower_middle_quartile \u0026lt;dbl\u0026gt; 50.8, 25.4, 2.0, 52.0, 49.4, 45.2, 9.0, 5… ## $ female_lower_middle_quartile \u0026lt;dbl\u0026gt; 49.2, 74.6, 98.0, 48.0, 50.6, 54.8, 91.0,… ## $ male_upper_middle_quartile \u0026lt;dbl\u0026gt; 49.2, 10.3, 11.0, 30.0, 22.8, 46.8, 10.0,… ## $ female_upper_middle_quartile \u0026lt;dbl\u0026gt; 50.8, 89.7, 89.0, 70.0, 77.2, 53.2, 90.0,… ## $ male_top_quartile \u0026lt;dbl\u0026gt; 51.5, 18.1, 23.0, 24.0, 58.2, 35.5, 9.0, … ## $ female_top_quartile \u0026lt;dbl\u0026gt; 48.5, 81.9, 77.0, 76.0, 41.8, 64.5, 91.0,… ## $ company_link_to_gpg_info \u0026lt;chr\u0026gt; \u0026#34;https://www.bryanston.co.uk/employment\u0026#34;,… ## $ responsible_person \u0026lt;chr\u0026gt; \u0026#34;Nick McRobb (Bursar and Clerk to the Gov… ## $ employer_size \u0026lt;chr\u0026gt; \u0026#34;500 to 999\u0026#34;, \u0026#34;250 to 499\u0026#34;, \u0026#34;250 to 499\u0026#34;,… ## $ current_name \u0026lt;chr\u0026gt; \u0026#34;BRYANSTON SCHOOL INCORPORATED\u0026#34;, \u0026#34;\\\u0026#34;RED B… ## $ submitted_after_the_deadline \u0026lt;lgl\u0026gt; FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, F… ## $ due_date \u0026lt;dttm\u0026gt; 2018-04-05, 2018-04-05, 2018-04-05, 2018… ## $ date_submitted \u0026lt;dttm\u0026gt; 2018-03-27 11:42:49, 2018-03-28 16:44:25… The variables that actually look at the differences here are the variables that contain \u0026ldquo;diff\u0026rdquo; in their name. Let\u0026rsquo;s look at those variables in detail.\npaygap_raw |\u0026gt; select(contains(\u0026#34;diff\u0026#34;)) ## # A tibble: 48,711 × 4 ## diff_mean_hourly_percent diff_median_hourl… diff_mean_bonus… diff_median_bon… ## \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; ## 1 18 28.2 0 0 ## 2 2.3 -2.7 15 37.5 ## 3 41 36 -69.8 -157. ## 4 -22 -34 -47 -67 ## 5 13.4 8.1 41.4 43.7 ## 6 15.1 2.8 77.6 71.2 ## 7 15 0 0 0 ## 8 11.9 0 0 0 ## 9 13.4 8.5 62.9 0 ## 10 15.3 6.9 55.5 1.6 ## # … with 48,701 more rows There are four variables. First two are differences in hourly pay (mean and median) and last two are differences in bonus (mean and median). The positive numbers have to be interpreted as men earning as much more than women in that company/organisation.\nA useful variable is the SIC code, stands for standard industrial classification of economic activities. It identifies the business that the company is operating in.\npaygap_raw |\u0026gt; select(contains(\u0026#34;sic\u0026#34;)) ## # A tibble: 48,711 × 1 ## sic_codes ## \u0026lt;chr\u0026gt; ## 1 85310 ## 2 47730 ## 3 78300 ## 4 93110 ## 5 56210:70229 ## 6 93110:93130:93290 ## 7 86900:88100 ## 8 56290 ## 9 1470:10910 ## 10 10120 ## # … with 48,701 more rows As can be noticed, some companies have more than one SIC code. Let\u0026rsquo;s separate them with seperate_rows() function from tidyr. Then, let\u0026rsquo;s count them to see which ones are the most common.1\npaygap_raw |\u0026gt; select(sic_codes) |\u0026gt; separate_rows(sic_codes, sep = \u0026#34;:\u0026#34;) |\u0026gt; count(sic_codes, sort = TRUE) ## # A tibble: 639 × 2 ## sic_codes n ## \u0026lt;chr\u0026gt; \u0026lt;int\u0026gt; ## 1 1 6584 ## 2 85310 3020 ## 3 \u0026lt;NA\u0026gt; 2894 ## 4 82990 2588 ## 5 85200 2219 ## 6 84110 1886 ## 7 70100 1541 ## 8 86900 1246 ## 9 78200 1149 ## 10 86210 1074 ## # … with 629 more rows But what do these SIC codes mean? Let\u0026rsquo;s check out! The CSV file is available at UK government\u0026rsquo;s website.\nuk_sic_codes = read_csv(\u0026#34;https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/527619/SIC07_CH_condensed_list_en.csv\u0026#34;) ## Rows: 731 Columns: 2 ## ── Column specification ──────────────────────────────────────────────────────── ## Delimiter: \u0026#34;,\u0026#34; ## chr (2): SIC Code, Description ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. uk_sic_codes ## # A tibble: 731 × 2 ## `SIC Code` Description ## \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; ## 1 01110 Growing of cereals (except rice), leguminous crops and oil seeds ## 2 01120 Growing of rice ## 3 01130 Growing of vegetables and melons, roots and tubers ## 4 01140 Growing of sugar cane ## 5 01150 Growing of tobacco ## 6 01160 Growing of fibre crops ## 7 01190 Growing of other non-perennial crops ## 8 01210 Growing of grapes ## 9 01220 Growing of tropical and subtropical fruits ## 10 01230 Growing of citrus fruits ## # … with 721 more rows The variable name needs to be cleaned.\nuk_sic_codes = uk_sic_codes |\u0026gt; janitor::clean_names() uk_sic_codes ## # A tibble: 731 × 2 ## sic_code description ## \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; ## 1 01110 Growing of cereals (except rice), leguminous crops and oil seeds ## 2 01120 Growing of rice ## 3 01130 Growing of vegetables and melons, roots and tubers ## 4 01140 Growing of sugar cane ## 5 01150 Growing of tobacco ## 6 01160 Growing of fibre crops ## 7 01190 Growing of other non-perennial crops ## 8 01210 Growing of grapes ## 9 01220 Growing of tropical and subtropical fruits ## 10 01230 Growing of citrus fruits ## # … with 721 more rows Visualise Differences Which companies have the highest differences? paygap_raw |\u0026gt; slice_max(order_by = diff_median_hourly_percent, n = 10) |\u0026gt; select(employer_name) |\u0026gt; unique() ## # A tibble: 16 × 1 ## employer_name ## \u0026lt;chr\u0026gt; ## 1 Shrewsbury Academies Trust ## 2 ASH \u0026amp; LACY FINISHES LIMITED ## 3 BEERE ELECTRICAL SERVICES LIMITED ## 4 HARVEY NICHOLS (OWN BRAND) STORES LIMITED ## 5 HARVEY NICHOLS RESTAURANTS LIMITED ## 6 J.C.B.EARTHMOVERS LIMITED ## 7 J5C MANAGEMENT LIMITED ## 8 JCB COMPACT PRODUCTS LIMITED ## 9 JCB POWER SYSTEMS LIMITED ## 10 KALSI PLASTICS (UK) LIMITED ## 11 M. ANDERSON CONSTRUCTION LIMITED ## 12 PLAYNATION LIMITED ## 13 PSJ FABRICATIONS LTD ## 14 WALTERS RESOURCES LIMITED ## 15 ATFC LIMITED ## 16 HPI UK HOLDING LTD. J.C.B. is the only familiar name to me. Is the difference one of the highest because of the business it\u0026rsquo;s involved in? Construction sector doesn\u0026rsquo;t employ many women. (If you\u0026rsquo;re curious why there are 16 names when I asked for top-10, it\u0026rsquo;s because some companies/roles have equal pay difference.)\nWhich companies have the lowest differences? paygap_raw |\u0026gt; slice_min(order_by = diff_median_hourly_percent, n = 10) |\u0026gt; select(employer_name) |\u0026gt; unique() ## # A tibble: 10 × 1 ## employer_name ## \u0026lt;chr\u0026gt; ## 1 ANKH CONCEPTS HOSPITALITY MANAGEMENT LIMITED ## 2 NSS CLEANING LIMITED ## 3 G4S SECURE SOLUTIONS (UK) LIMITED ## 4 AUTO-SLEEPERS GROUP LIMITED ## 5 AUTO-SLEEPERS INVESTMENTS LIMITED ## 6 BAR 2010 LIMITED ## 7 INBRELLA LIMITED ## 8 DONALDSON TIMBER ENGINEERING LIMITED ## 9 FORTEL SERVICES LIMITED ## 10 SPRINGFIELD PROPERTIES PLC Some of these look like housekeeping companies.\nDistribution of Hourly Pay Let\u0026rsquo;s start by seeing distribution of median difference in hourly pay.\npaygap_raw |\u0026gt; ggplot(aes(diff_median_hourly_percent / 100)) + geom_histogram(bins = 25) + scale_x_continuous(limits = c(-0.5, 0.5), labels = scales::percent) + ylim(c(0, 10000)) + labs(x = \u0026#34;Difference\u0026#34;, y = \u0026#34;Count\u0026#34;, caption = \u0026#34;A value of 10% implies that men earn 10% more hourly wage than women.\u0026#34;, title = \u0026#34;Median Hourly Pay Difference\u0026#34;) ## Warning: Removed 901 rows containing non-finite values (`stat_bin()`). ## Warning: Removed 2 rows containing missing values (`geom_bar()`). There are a lot of companies that are on the positive side than they are on the negative side.\nDistribution of Bonus paygap_raw |\u0026gt; ggplot(aes(diff_median_bonus_percent / 100)) + geom_histogram(bins = 25) + scale_x_continuous(limits = c(-0.5, 0.5), labels = scales::percent) + ylim(c(0, 10000)) + labs(x = \u0026#34;Difference\u0026#34;, y = \u0026#34;Count\u0026#34;, caption = \u0026#34;A value of 10% implies that men earned 10% more bonus than women.\u0026#34;, title = \u0026#34;Median Bonus Difference\u0026#34;) ## Warning: Removed 19163 rows containing non-finite values (`stat_bin()`). ## Warning: Removed 2 rows containing missing values (`geom_bar()`). Ooooh. In most cases, the difference in bonus is zero. Let\u0026rsquo;s see which companies have the highest difference in bonus.\npaygap_raw |\u0026gt; mutate(diff_median_bonus_percent = diff_median_bonus_percent/100) |\u0026gt; slice_max(diff_median_bonus_percent, n = 10) |\u0026gt; select(contains(\u0026#34;employer\u0026#34;), diff_median_bonus_percent) ## # A tibble: 10 × 4 ## employer_name employer_id employer_size diff_median_bon… ## \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; ## 1 The Order of St. Augustine of the… 17584 250 to 499 40 ## 2 BOWDRAPER LIMITED 2275 500 to 999 38.5 ## 3 PRISM UK MEDICAL LIMITED 10055 250 to 499 3.36 ## 4 ROBINSON MEDICAL RECRUITMENT LIMI… 17323 500 to 999 3.24 ## 5 RED RECRUITMENT PARTNERSHIP LIMIT… 10332 Less than 250 3.17 ## 6 CARE BY US LTD 16116 500 to 999 3.12 ## 7 TRIFORDS LIMITED 12941 250 to 499 2.86 ## 8 VALE OF GLAMORGAN HOTEL LIMITED 13230 250 to 499 2.81 ## 9 TRAFFORD LEISURE COMMUNITY INTERE… 17413 250 to 499 1.92 ## 10 The Healthcare Management Trust 12394 250 to 499 1.9 What do we have here\u0026hellip; The Order of St. Augustine of the Mercy of Jesus (Roman Catholic Church) has the highest difference in bonus: 40%. Bowdraper is a cleaning service company.\nTo proceed, I need to join the SIC code values to that data frame. Before that, I have to separate the SIC codes which can be separated using :.\npaygap_joined = paygap_raw |\u0026gt; #select(employer_name, diff_median_hourly_percent, sic_codes) |\u0026gt; separate_rows(sic_codes, sep = \u0026#34;:\u0026#34;) |\u0026gt; left_join(uk_sic_codes, by = c(\u0026#34;sic_codes\u0026#34; = \u0026#34;sic_code\u0026#34;)) paygap_joined ## # A tibble: 71,943 × 28 ## employer_name employer_id address post_code company_number sic_codes ## \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; ## 1 Bryanston School, Inc… 676 Bryans… DT11 0PX 00226143 85310 ## 2 RED BAND CHEMICAL COM… 16879 19 Smi… EH6 8NU SC016876 47730 ## 3 123 EMPLOYEES LTD 17677 34 Rou… LS7 1AB 10530651 78300 ## 4 1610 LIMITED 682 Trinit… TA6 3JA 06727055 93110 ## 5 1879 EVENTS MANAGEMEN… 17101 The Su… SR5 1SU 07743495 56210 ## 6 1879 EVENTS MANAGEMEN… 17101 The Su… SR5 1SU 07743495 70229 ## 7 1LIFE MANAGEMENT SOLU… 687 Ldh Ho… PE27 4AA 02566586 93110 ## 8 1LIFE MANAGEMENT SOLU… 687 Ldh Ho… PE27 4AA 02566586 93130 ## 9 1LIFE MANAGEMENT SOLU… 687 Ldh Ho… PE27 4AA 02566586 93290 ## 10 1ST HOME CARE LTD. 17484 Real L… KY12 7LG SC272838 86900 ## # … with 71,933 more rows, and 22 more variables: ## # diff_mean_hourly_percent \u0026lt;dbl\u0026gt;, diff_median_hourly_percent \u0026lt;dbl\u0026gt;, ## # diff_mean_bonus_percent \u0026lt;dbl\u0026gt;, diff_median_bonus_percent \u0026lt;dbl\u0026gt;, ## # male_bonus_percent \u0026lt;dbl\u0026gt;, female_bonus_percent \u0026lt;dbl\u0026gt;, ## # male_lower_quartile \u0026lt;dbl\u0026gt;, female_lower_quartile \u0026lt;dbl\u0026gt;, ## # male_lower_middle_quartile \u0026lt;dbl\u0026gt;, female_lower_middle_quartile \u0026lt;dbl\u0026gt;, ## # male_upper_middle_quartile \u0026lt;dbl\u0026gt;, female_upper_middle_quartile \u0026lt;dbl\u0026gt;, … Let\u0026rsquo;s see how many unique descriptions are there for SIC codes.\npaygap_joined |\u0026gt; count(description, sort = TRUE) ## # A tibble: 611 × 2 ## description n ## \u0026lt;chr\u0026gt; \u0026lt;int\u0026gt; ## 1 \u0026lt;NA\u0026gt; 10093 ## 2 General secondary education 3020 ## 3 Other business support service activities n.e.c. 2588 ## 4 Primary education 2219 ## 5 General public administration activities 1886 ## 6 Activities of head offices 1541 ## 7 Other human health activities 1246 ## 8 Temporary employment agency activities 1149 ## 9 General medical practice activities 1074 ## 10 Other service activities n.e.c. 841 ## # … with 601 more rows Many of them are similar to each other. \u0026ldquo;General secondary education\u0026rdquo; is very similar to \u0026ldquo;Primary education\u0026rdquo; \u0026mdash; considering teachers as one group might be more meaningful for analysis.\nThis can be done using tidytext package. I\u0026rsquo;m not interested in stop words, I will remove them. There are also many missing descriptions; I\u0026rsquo;ll remove them too.\nlibrary(tidytext) paygap_tokenized = paygap_joined |\u0026gt; unnest_tokens(word, description) |\u0026gt; anti_join(get_stopwords()) |\u0026gt; na.omit() ## Joining, by = \u0026#34;word\u0026#34; paygap_tokenized ## # A tibble: 129,419 × 28 ## employer_name employer_id address post_code company_number sic_codes ## \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; ## 1 Bryanston School, Inc… 676 Bryans… DT11 0PX 00226143 85310 ## 2 Bryanston School, Inc… 676 Bryans… DT11 0PX 00226143 85310 ## 3 Bryanston School, Inc… 676 Bryans… DT11 0PX 00226143 85310 ## 4 1610 LIMITED 682 Trinit… TA6 3JA 06727055 93110 ## 5 1610 LIMITED 682 Trinit… TA6 3JA 06727055 93110 ## 6 1610 LIMITED 682 Trinit… TA6 3JA 06727055 93110 ## 7 1879 EVENTS MANAGEMEN… 17101 The Su… SR5 1SU 07743495 56210 ## 8 1879 EVENTS MANAGEMEN… 17101 The Su… SR5 1SU 07743495 56210 ## 9 1879 EVENTS MANAGEMEN… 17101 The Su… SR5 1SU 07743495 56210 ## 10 1879 EVENTS MANAGEMEN… 17101 The Su… SR5 1SU 07743495 70229 ## # … with 129,409 more rows, and 22 more variables: ## # diff_mean_hourly_percent \u0026lt;dbl\u0026gt;, diff_median_hourly_percent \u0026lt;dbl\u0026gt;, ## # diff_mean_bonus_percent \u0026lt;dbl\u0026gt;, diff_median_bonus_percent \u0026lt;dbl\u0026gt;, ## # male_bonus_percent \u0026lt;dbl\u0026gt;, female_bonus_percent \u0026lt;dbl\u0026gt;, ## # male_lower_quartile \u0026lt;dbl\u0026gt;, female_lower_quartile \u0026lt;dbl\u0026gt;, ## # male_lower_middle_quartile \u0026lt;dbl\u0026gt;, female_lower_middle_quartile \u0026lt;dbl\u0026gt;, ## # male_upper_middle_quartile \u0026lt;dbl\u0026gt;, female_upper_middle_quartile \u0026lt;dbl\u0026gt;, … Let\u0026rsquo;s see the most common words.\npaygap_tokenized |\u0026gt; count(word, sort = T) ## # A tibble: 842 × 2 ## word n ## \u0026lt;chr\u0026gt; \u0026lt;int\u0026gt; ## 1 activities 11925 ## 2 n.e.c 5121 ## 3 manufacture 3905 ## 4 service 3235 ## 5 sale 2582 ## 6 support 2174 ## 7 business 2013 ## 8 specialised 1930 ## 9 motor 1876 ## 10 retail 1855 ## # … with 832 more rows There are 848 words. Some of them are useless, like \u0026ldquo;activities,\u0026ldquo;n.e.c\u0026rdquo;, \u0026ldquo;general\u0026rdquo; and \u0026ldquo;non\u0026rdquo;. I\u0026rsquo;ll remove them. If I\u0026rsquo;m going to build any useful model, 858 categories are not going to be useful for me. Let\u0026rsquo;s reduce them to say 40 words and call that top_words.\ntop_words = paygap_tokenized |\u0026gt; count(word) |\u0026gt; filter(!word %in% c(\u0026#34;activities\u0026#34;, \u0026#34;n.e.c\u0026#34;, \u0026#34;general\u0026#34;, \u0026#34;non\u0026#34;)) |\u0026gt; slice_max(n, n = 40) |\u0026gt; pull(word) Let\u0026rsquo;s take the tokenised dataset, filter only the top words. Then, we will see how different jobs have differences in pays.\npaygap = paygap_tokenized |\u0026gt; filter(word %in% top_words) |\u0026gt; transmute(diff_wage = diff_median_hourly_percent / 100, word) paygap ## # A tibble: 48,573 × 2 ## diff_wage word ## \u0026lt;dbl\u0026gt; \u0026lt;chr\u0026gt; ## 1 0.282 education ## 2 -0.34 facilities ## 3 0.081 management ## 4 0.081 consultancy ## 5 0.081 financial ## 6 0.081 management ## 7 0.028 facilities ## 8 0.028 facilities ## 9 0 human ## 10 0 health ## # … with 48,563 more rows Okay, now we are ready to analyse the differences.\nComparing by SIC Codes paygap_joined |\u0026gt; mutate(diff_wage = diff_median_hourly_percent / 100) |\u0026gt; group_by(description) |\u0026gt; summarise(diff_wage = mean(diff_wage)) |\u0026gt; arrange(desc(diff_wage)) ## # A tibble: 611 × 2 ## description diff_wage ## \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; ## 1 Factoring 0.355 ## 2 Manufacture of wiring devices 0.344 ## 3 Plumbing, heat and air-conditioning installation 0.333 ## 4 Banks 0.333 ## 5 Non-scheduled passenger air transport 0.331 ## 6 Binding and related services 0.312 ## 7 Activities of construction holding companies 0.312 ## 8 Manufacture of tools 0.308 ## 9 Security and commodity contracts dealing activities 0.307 ## 10 Electrical installation 0.304 ## # … with 601 more rows What is factoring? I\u0026rsquo;ve never heard of it. Here\u0026rsquo;s how the website describes it.\nSIC Code 64992: Factoring\nList of activities classified inside the UK SIC Code 64992\nDebt purchasing\nDiscount company (e.g. Debt factoring)\nFactoring company (buying book debts)\nInvoice discounting\nOther top contenders are manufacturing, plumbing services, etc.\nLet\u0026rsquo;s visualise the difference. Who doesn\u0026rsquo;t like pictures!\nIndustries with highest (average) hourly median difference paygap_joined |\u0026gt; mutate(diff_wage = diff_median_hourly_percent / 100) |\u0026gt; group_by(description) |\u0026gt; summarise(diff_wage = mean(diff_wage)) |\u0026gt; slice_max(diff_wage, n = 10) |\u0026gt; mutate(description = fct_reorder(description, diff_wage)) |\u0026gt; ggplot(aes(x = description, y = diff_wage)) + geom_point(alpha = 0.9, size = 2) + scale_x_discrete(labels = \\(x) stringr::str_wrap(x, width = 50)) + labs(x = \u0026#34;Industry SIC\u0026#34;, y = \u0026#34;Percentage\u0026#34;, caption = \u0026#34;A value of 10% implies that men earn 10% more than women.\u0026#34;, title = \u0026#34;Median Hourly Pay Difference\u0026#34;) + coord_flip() + theme(plot.title.position = \u0026#34;plot\u0026#34;) Industries with lowest (average) hourly median difference paygap_joined |\u0026gt; mutate(diff_wage = diff_median_hourly_percent / 100) |\u0026gt; group_by(description) |\u0026gt; summarise(diff_wage = mean(diff_wage)) |\u0026gt; slice_min(diff_wage, n = 10) |\u0026gt; mutate(description = fct_reorder(description, diff_wage)) |\u0026gt; ggplot(aes(x = description, y = diff_wage)) + geom_point(alpha = 0.9, size = 2) + scale_x_discrete(labels = \\(x) stringr::str_wrap(x, width = 50)) + labs(x = \u0026#34;Industry SIC\u0026#34;, y = \u0026#34;Percentage\u0026#34;, caption = \u0026#34;A value of -10% implies that men earn 10% less than women.\u0026#34;, title = \u0026#34;Median Hourly Pay Difference\u0026#34;) + coord_flip() + theme(plot.title.position = \u0026#34;plot\u0026#34;) The differences are lowest in services and manufacturing activities (factory work).\nThese industry classifications are confusing; they provide too specific detail.\nLet\u0026rsquo;s visualise the difference by words in description. Recall that we stored it in paygap data frame.\npaygap |\u0026gt; group_by(word) |\u0026gt; summarise(diff_wage = mean(diff_wage)) |\u0026gt; slice_max(diff_wage, n = 10) |\u0026gt; mutate(word = fct_reorder(word, diff_wage)) |\u0026gt; ggplot(aes(x = word, y = diff_wage)) + geom_point(alpha = 0.9, size = 2) + labs(x = NULL, y = \u0026#34;Percentage\u0026#34;, title = \u0026#34;Percentage increase in men\u0026#39;s hourly wages compared to women\u0026#39;s\u0026#34;) + coord_flip() + theme(plot.title.position = \u0026#34;plot\u0026#34;) Education has the highest wage difference.\nLet\u0026rsquo;s see which has the lowest wage difference.\npaygap |\u0026gt; group_by(word) |\u0026gt; summarise(diff_wage = mean(diff_wage)) |\u0026gt; mutate(word = fct_reorder(word, diff_wage)) |\u0026gt; slice_min(diff_wage, n = 10) |\u0026gt; ggplot(aes(x = word, y = diff_wage)) + geom_point(alpha = 0.9, size = 2) + labs(x = NULL, y = \u0026#34;Percentage\u0026#34;, title = \u0026#34;Percentage increase in men\u0026#39;s hourly wages compared to women\u0026#39;s\u0026#34;) + coord_flip() + theme(plot.title.position = \u0026#34;plot\u0026#34;) Management, business and transportation service businesses look like to have the least differences.\nThe average is not enough. Let\u0026rsquo;s fit a simple linear regression model. I\u0026rsquo;m forcing the intercept to be zero as I\u0026rsquo;m only looking for the differences due to the word; else the difference should be zero.\npaygap_fit = lm(diff_wage ~ 0 + word, data = paygap) broom::tidy(paygap_fit) ## # A tibble: 40 × 5 ## term estimate std.error statistic p.value ## \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; ## 1 wordaccommodation 0.0285 0.00532 5.35 8.66e- 8 ## 2 wordagency 0.0752 0.00546 13.8 5.22e- 43 ## 3 wordbusiness 0.164 0.00314 52.3 0 ## 4 wordcare 0.0137 0.00525 2.61 8.97e- 3 ## 5 wordcars 0.157 0.00496 31.7 2.82e-218 ## 6 wordconstruction 0.209 0.00401 52.1 0 ## 7 wordconsultancy 0.195 0.00555 35.2 1.67e-268 ## 8 worddevelopment 0.184 0.00569 32.3 5.43e-226 ## 9 wordeducation 0.155 0.00458 33.9 1.53e-248 ## 10 wordemployment 0.0662 0.00504 13.1 2.49e- 39 ## # … with 30 more rows ggstatsplot package provides beautiful ways to present these results.\nlibrary(ggstatsplot) ## You can cite this package as: ## Patil, I. (2021). Visualizations with statistical details: The \u0026#39;ggstatsplot\u0026#39; approach. ## Journal of Open Source Software, 6(61), 3167, doi:10.21105/joss.03167 names(paygap_fit$coefficients) = str_remove(names(paygap_fit$coefficients), \u0026#34;word\u0026#34;) ggcoefstats(paygap_fit, output = \u0026#34;plot\u0026#34;, sort = \u0026#34;descending\u0026#34;, stats.labels = FALSE, exclude.intercept = TRUE, only.significant = TRUE) + scale_y_discrete(labels = \\(x) stringr::str_wrap(x, width = 50)) ## size aesthetic has been deprecated for use with lines as of ggplot2 3.4.0 ## ℹ Please use linewidth aesthetic instead ## This message is displayed once every 8 hours. Primary education (and in general education) has the highest wage gap. Can any one explain that to me? (Note that in above plot, only significant variables are shown.)\nWhat about differences by industries? I\u0026rsquo;ll keep only top-10 industries and classify all others as \u0026ldquo;others\u0026rdquo;.\npaygap_fit = paygap_joined |\u0026gt; mutate(diff_wage = diff_median_hourly_percent / 100, description = fct_lump_n(f = description, n = 10)) %\u0026gt;% lm(diff_wage ~ 0 + description, data = .) broom::tidy(paygap_fit) ## # A tibble: 11 × 5 ## term estimate std.error statistic p.value ## \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; ## 1 descriptionActivities of head offices 0.164 0.00399 41.1 0 ## 2 descriptionGeneral medical practice a… 0.107 0.00478 22.3 1.36e-109 ## 3 descriptionGeneral public administrat… 0.0515 0.00361 14.3 3.48e- 46 ## 4 descriptionGeneral secondary education 0.272 0.00285 95.5 0 ## 5 descriptionOther business support ser… 0.143 0.00308 46.4 0 ## 6 descriptionOther human health activit… 0.0280 0.00444 6.30 3.02e- 10 ## 7 descriptionOther service activities n… 0.103 0.00540 19.1 3.04e- 81 ## 8 descriptionPre-primary education 0.270 0.00579 46.7 0 ## 9 descriptionPrimary education 0.292 0.00333 87.7 0 ## 10 descriptionTemporary employment agenc… 0.0464 0.00462 10.0 1.09e- 23 ## 11 descriptionOther 0.111 0.000734 152. 0 Pictures!\nnames(paygap_fit$coefficients) = str_remove(names(paygap_fit$coefficients), \u0026#34;description\u0026#34;) ggcoefstats(paygap_fit, output = \u0026#34;plot\u0026#34;, sort = \u0026#34;descending\u0026#34;, stats.labels = FALSE, exclude.intercept = TRUE, only.significant = TRUE) + scale_y_discrete(labels = \\(x) stringr::str_wrap(x, width = 50)) The difference is least in unlicensed cafes and restaurants, healthcare facilities and social work areas. That\u0026rsquo;s something positive. (Note that in above plot, only significant variables are shown.)\nHow does hourly pay gap correspond to bonus? Hourly Pay and Bonus by Employer I\u0026rsquo;m averaging the data we have for each employer. Each point represents a company. I\u0026rsquo;ve removed companies which had more than 50% difference in pay. It is sad in itself that we have those companies, but including them would worsen our plots and hide the cases where we can have significant impact.\npaygap_employer = paygap_raw |\u0026gt; mutate(diff_median_bonus_percent = diff_median_bonus_percent/100, diff_median_hourly_percent = diff_median_hourly_percent) |\u0026gt; group_by(employer_name) |\u0026gt; summarise(diff_median_bonus_percent = mean(diff_median_bonus_percent, na.rm = TRUE), diff_median_hourly_percent = mean(diff_median_hourly_percent, na.rm = TRUE)) |\u0026gt; na.omit() paygap_employer ## # A tibble: 13,636 × 3 ## employer_name diff_median_bonu… diff_median_hou… ## \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; ## 1 10 TRINITY SQUARE HOTEL LIMITED 0.545 10.3 ## 2 123 EMPLOYEES LTD -0.441 32.5 ## 3 123-REG LIMITED 0.402 18.1 ## 4 1509 GROUP 0 13.8 ## 5 1610 LIMITED -0.25 -35 ## 6 1825 FINANCIAL PLANNING AND ADVICE LIMITED 0.83 42.6 ## 7 1879 EVENTS MANAGEMENT LIMITED 0.437 8.1 ## 8 1LIFE MANAGEMENT SOLUTIONS LIMITED 0.392 -18.4 ## 9 1ST CHOICE STAFF RECRUITMENT LIMITED -2.39 -1 ## 10 1ST HOME CARE LTD. 0 0.1 ## # … with 13,626 more rows paygap_employer |\u0026gt; ggplot(aes(x = diff_median_hourly_percent/100, y = diff_median_bonus_percent/100)) + geom_point(alpha = 0.3, size = 3) + scale_x_continuous(limits = c(-0.5, 0.5), labels = scales::percent) + scale_y_continuous(limits = c(-0.5, 0.55), labels = scales::percent) + labs(x = \u0026#34;Hourly pay difference\u0026#34;, y = \u0026#34;Bonus pay difference\u0026#34;, caption = \u0026#34;Each point represents a company. I\u0026#39;ve removed companies which had more than 50% difference in pay.\u0026#34;, title = \u0026#34;How pay hourly pay and bonus difference vary by company\u0026#34;) + theme(plot.title.position = \u0026#34;plot\u0026#34;) ## Warning: Removed 197 rows containing missing values (`geom_point()`). Hourly Pay and Bonus by Industry We can also look at the differences by industry.\npaygap_industry = paygap_joined |\u0026gt; mutate(diff_median_bonus_percent = diff_median_bonus_percent/100, diff_median_hourly_percent = diff_median_hourly_percent) |\u0026gt; group_by(description) |\u0026gt; summarise(diff_median_bonus_percent = mean(diff_median_bonus_percent, na.rm = TRUE), diff_median_hourly_percent = mean(diff_median_hourly_percent, na.rm = TRUE)) |\u0026gt; na.omit() paygap_industry ## # A tibble: 609 × 3 ## description diff_median_bon… diff_median_hou… ## \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; ## 1 Accounting and auditing activities 0.179 12.2 ## 2 Activities auxiliary to financial intermed… 0.359 20.6 ## 3 Activities of amusement parks and theme pa… 0.187 3.75 ## 4 Activities of business and employers membe… -0.0625 8.77 ## 5 Activities of call centres 0.121 3.51 ## 6 Activities of collection agencies 0.250 8.74 ## 7 Activities of conference organisers 0.311 10.7 ## 8 Activities of construction holding compani… 0.448 31.2 ## 9 Activities of credit bureaus 0.318 -0.88 ## 10 Activities of distribution holding compani… 0.261 14.5 ## # … with 599 more rows paygap_industry |\u0026gt; ggplot(aes(x = diff_median_hourly_percent/100, y = diff_median_bonus_percent/100, label = description)) + geom_point(alpha = 0.3, size = 3) + scale_x_continuous(labels = scales::percent) + scale_y_continuous(labels = scales::percent) + labs(x = \u0026#34;Hourly pay difference\u0026#34;, y = \u0026#34;Bonus pay difference\u0026#34;, caption = \u0026#34;Each point represents an industry.\u0026#34;, title = \u0026#34;How pay hourly pay and bonus difference vary by industry\u0026#34;) + theme(plot.title.position = \u0026#34;plot\u0026#34;) The two outliers are \u0026ldquo;Manufacturer of ceramic tiles\u0026rdquo; who pay 60% less bonus to women than men while having 16% less hourly wage, and \u0026ldquo;Retail sale of bread, cakes, flour confectionary and sugar confectionary in specialised stores\u0026rdquo; where women have 20% less bonus but 8% more hourly wage.\nHourly Pay and Bonus by Industry-Word Recall that we found the top common words from the descriptions that represented the general ideas.\npaygap_words = paygap_tokenized |\u0026gt; filter(word %in% top_words) |\u0026gt; transmute(diff_wage = diff_median_hourly_percent / 100, diff_bonus = diff_median_bonus_percent/ 100, word) |\u0026gt; group_by(word) |\u0026gt; summarise(diff_wage = mean(diff_wage), diff_bonus = mean(diff_bonus)) paygap_words ## # A tibble: 40 × 3 ## word diff_wage diff_bonus ## \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; ## 1 accommodation 0.0285 0.0418 ## 2 agency 0.0752 -0.220 ## 3 business 0.164 0.0928 ## 4 care 0.0137 0.000628 ## 5 cars 0.157 0.347 ## 6 construction 0.209 -0.0564 ## 7 consultancy 0.195 0.243 ## 8 development 0.184 0.168 ## 9 education 0.155 -0.0361 ## 10 employment 0.0662 -0.386 ## # … with 30 more rows paygap_words |\u0026gt; ggplot(aes(x = diff_wage, y = diff_bonus, label = word)) + ggrepel::geom_text_repel(size = 3) + labs(x = \u0026#34;Hourly pay difference\u0026#34;, y = \u0026#34;Bonus pay difference\u0026#34;, title = \u0026#34;How does pay vary by industry?\u0026#34;) + theme(plot.title.position = \u0026#34;plot\u0026#34;) ## Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider ## increasing max.overlaps It\u0026rsquo;s 12:59 am now and I\u0026rsquo;m sleepy. Probably will pick this up again some day.\nI somehow keep forgetting about count() and end up grouping and summarising, which is a much more complicated way of achieving the same thing. Probably, I need to think of them as equivalent to pandas\u0026rsquo; value_counts() .\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/the-ascribed-advantage/","summary":"How does gender affects pay? In this short exploration, I use #tidytuesday dataset provided by UK Government to visualise gender gaps using R. And why are women working at churches paid 40% less bonus than men?","title":"The Ascribed Advantage"},{"content":"Last week, I received a note from a reader of my newsletter:\nDear Harshvardhan,\nI would like to say in advance that I enjoy your collection of R packages in your newsletter very much.\nToday, I saw that you featured the talk „The best stats you\u0026rsquo;ve ever seen\u0026quot; by Hans Rosling the second time in your newsletter, but I cannot say I share your enthusiasm about it. It could have been equally titled „How to lie with statistics\u0026quot;, as it takes quite some effort to (mis)lead the audience in a particular direction.\nTake the „Income Mountains\u0026quot;: By stacking different regions on top of each other the graph hides inequalities between groups. Even more important, regions are heterogeneous. Who would put Urugay, Chile or Costa Rica in the same group as Bolivia, Haiti or Guatemala?\nBut it gets worse when looking at the underlying transformation: Using logarithmic scale hides the long tail of rich people and thus extreme differences in income. (It is worth pointing out that the Gapminder foundation is sponsored by Bill Gates these days.)\nAnd the raw data are unfit for the purpose either: PPP as used by the World Bank \u0026ndash; Roslings data source \u0026ndash; overstates the purchasing power of the poor vis-a-vis the rich. Calculations of PPP invariably make poor people\u0026rsquo;s income look greater than it is.\nAnd all of this to convey the happy message that we do not live in a two-hump world anymore. Sadly, this is not the case and I believe putting the talk into context would have been appropriate for the newsletter. Something I actually urge you to do for the next issue.\nRead on: https://www.jasonhickel.org/blog/2019/3/17/two-hump-world\nBest,\nThey were referring to Hans Rosling\u0026rsquo;s talk on how our pre-conceived notions of the world are not aligned with general reality shown by data.\nDevelopment and growth are tricky subjects.1 There are complicated and do not seem to have one single answer.\nIn this TED talk, Rosling presents how when he started teaching as a professor of international health at the Karolinska Institute, his biggest problem was not the student\u0026rsquo;s ignorance but preconceived notions. He asked his students to choose one country from the pair with higher mortality rates.\nThe countries with the arrow actually have higher child mortality rate. But the bigger lesson is the students performed worse than a Chimpanzee (or a coin toss) and professors performed as good/bad as a Chimpanzee.\nThe graph that stood out most to me was how life expectancy had varied by GDP per capita over the years. Here\u0026rsquo;s an updated version of the graph. Hit play button.\nBut what they had written to me was about Income Mountains.\nMountain Peaks Rosling\u0026rsquo;s point was the incomes are converging to one \u0026ldquo;hump\u0026rdquo;, something that my reader and Jason Hickel disputed. Here\u0026rsquo;s how the graph looks in 2020.\nBut everything aside, it was great to hear some criticism of Hans Rosling\u0026rsquo;s work. I read his book Factfulness which interested me in telling stories with data. I hadn\u0026rsquo;t heard/seen significant criticism of his work \u0026mdash; which is dangerous, especially in development economy works.2\nStacking Regions The stacked plots do hide the differences between the regions. Countries in the same group likely have very different economic indicators, which is one way to say the average is the average because it\u0026rsquo;s the average. In the figure below, I expanded his chart to all countries.\nOne obvious peril is that smaller countries like Uruguay are now almost hidden. However, the trend still looks the same. Both Uruguay\u0026rsquo;s hump and the global hump are very much aligned. Furthermore, all six countries mentioned in their email have a pretty similar distribution.\nLogarithmic Scales Logarithmic scales are hideous. They correspond larger differences to smaller differences, making it easy to read but not easy to interpret. However, one good thing about log scales is that they supply growth numbers; they\u0026rsquo;re on relative changes rather than absolute change. Hans Rosling acknowledges it in the talk saying, \u0026ldquo;our concept of economy is to look at growth with percentage\u0026rdquo;.\nI\u0026rsquo;ve been fiddling with the idea in my head of what happens if countries stop growing, and I\u0026rsquo;ve not settled on an answer yet. It was great to see Jason Hickel (whose blog they shared) had a book on it: Less is More: How Degrowth Will Save the World and I will check it out.\nPurchasing Power Parity PPP is a wrong but valuable metric. Before starting my PhD, I worked for a few months in India. According to tax returns, my pay would\u0026rsquo;ve put me in 0.1% of India. During my PhD, I\u0026rsquo;m graciously supported with a scholarship. Just the scholarship amount, when converted to Indian rupees using the current exchange rate of ₹78:$1, would put me in 0.001% of India. And I\u0026rsquo;m making below US minimum wage on the scholarship.\nPPP exchange rates are not the complete picture. But they are a compromise between foreign exchange rates and no exchange rate. People argue for RER, REER, NEER, PER and many others, but each has its pros and cons.\nJason presents a graph with constant dollars. While eye-opening, I don\u0026rsquo;t think that\u0026rsquo;s very useful.\nThe blog talks about issues with making the baskets as well. That is well-founded. People consume different goods in different countries, and it\u0026rsquo;s tough to come up with the same basket. Some economists suggested using time as a metric: consumption in a day measured in local currency.3 That has its shortcomings.\nThe broader picture from Rosling\u0026rsquo;s talk is about bringing forward the optimistic view of the world. His central argument is that economic conditions in developing countries are not as bad as they used to be. Does that mean we\u0026rsquo;ve reached our goal? Far from it. Bill and Melinda Gates Foundation is working on it in many cases, including by supporting Gapminder.4\nLiving here in the US made me realise the importance of starting resources. An average kid in developed countries has more options than her counterpart in the developing world. This means there are more questions than answers, which is why development economics is a growing field these days.\nHopefully, we will make it so that everyone can be rich.5\nUpdate: July 5, 2022 They responded back to my note and this blog post bringing forward another set of interesting points.\nIt is important to understand logarithmic transformations are not trivial. As they correctly identified \u0026mdash; which most readers would miss \u0026mdash; logarithmic scales in graphs grossly underestimate actual differences.6\nFor example, the difference between $1 and $10 in this graph actually means a difference of $3 and $22,026. Why use logarithmic scales at all? Because incomes vary widely and log-transform brings some order. Furthermore, growth is more easily captured with log numbers. But difference of $22,023 is much more than $9, right?7\nOne could also ask whether relative change is actually what is of interest. It is often deemed appropriate with a reference to \u0026ldquo;diminishing returns\u0026rdquo;: the logarithmic scale incorporates peoples perception about the worth of additional income, giving 1 dollar to a person with 1$/day income is much more significant than giving to a 10$/day person. But to quote Jason Hickel: \u0026ldquo;Ultimately, there is a difference in perspectives at stake here. Additional dollars going to the rich are, from the perspective of the rich, diminishing in terms of marginal utility. But from the perspective of the poor they represent increasing egregiousness. To rely solely on the theory of diminishing marginal utility in discussing inequality, then, is to adopt the perspective of the rich and dress it up as neutral and objective.\u0026rdquo;\nIf you have a hundred dollar and I give you a hundred more, you\u0026rsquo;d be ecstatic. If you have a million and I give you a hundred more, you\u0026rsquo;d be meh\u0026hellip;\nTo hammer the point home, here\u0026rsquo;s what they say:\nWhen transforming and ultimately visualizing data, we are driven by questions, make omissions, simplifications and assumptions. It is not proper to omit them afterwards as more often than not we communicate \u0026ldquo;facts\u0026rdquo; through our very own lens.\nThanks for reading my blogpost. Thanks to the wonderful readers who take time to write back with such fervor. Such interactions motivate me to continue writing.\nNext \u0026mdash; Today I Learnt About R is a free weekly newsletter on R related stories. I present five stories, four packages, three jargon, two tweets and one meme related to Data Science and R. Click here to subscribe or read past editions.\nProf Abhijit Banerjee\u0026rsquo;s lecture on The Challenge of World Poverty might be a good starting point.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nSome people say development economics is all that\u0026rsquo;s left for novel economic work. No wonder many recent Nobel prize economists worked in the area of development economics.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThe dollar street has several other visualisations: on alcohol, on pets, on toothbrushes and even toys.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nI don\u0026rsquo;t agree Gapminder being supported by Gates Foundation is the same as being sponsored by Bill Gates. Isn\u0026rsquo;t that akin to calling my $10 donation to Wikipedia a sponsorship? Also, Gapminder has apparently diversified its income sources to strengthen their economic independence. No single source of income is allowed to exceed 60% of the total annual income.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nIf you\u0026rsquo;re looking for a shortcut, there are none. Here are some tips to get started on your long journey.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThey shared this amazing website which shows wealth in pixels. No log scales; just note the difference.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThis description is wrong/incomplete. Here\u0026rsquo;s how Gapminder makes income mountains: https://www.gapminder.org/data/documentation/income-mountains-dataset/.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/the-best-stats-you-ve-ever-seen/","summary":"There\u0026rsquo;s a famous saying: All models are wrong but some are useful. How much of statistics is wrong and how much of it is useful? Some thoughts on Hans Rosling\u0026rsquo;s popular talk on global economic development and optimism.","title":"The best stats you've ever seen"},{"content":"\nWhat is Next? A short and sweet curated collection of R-related works. Five stories. Four packages. Three jargons. Two tweets. One Meme.\nYou can subscribe by providing your details here. Promise, no spams.\nEmail address First name (Optional) Last name (Optional) By subscribing, you agree with Revue’s Terms of Service and Privacy Policy. If you are unsure, here are some editions that my readers loved.\nThe best stats you\u0026rsquo;ve ever seen German Tank Problem, Unix Philosophy and Accidental aRt Data Science in Industry When Not to Use Machine Learning? Simulating Squid Game in R Here is a list of packages that I\u0026rsquo;ve covered in my letters. The list updated every month.\n","permalink":"/newsletter/","summary":"\u003cp\u003e\u003cimg alt=\"Title Image Next - Today I Learnt About R\" loading=\"lazy\" src=\"/img/next.png\"\u003e\u003c/p\u003e\n\u003ch1 id=\"what-is-next\"\u003eWhat is Next?\u003c/h1\u003e\n\u003cblockquote\u003e\n\u003cp\u003eA short and sweet curated collection of R-related works. Five stories. Four packages. Three jargons. Two tweets. One Meme.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eYou can subscribe by providing your details here. Promise, no spams.\u003c/p\u003e\n\u003cdiv id=\"revue-embed\"\u003e\n  \u003cform action=\"https://www.getrevue.co/profile/harshbutjust/add_subscriber\" method=\"post\" id=\"revue-form\" name=\"revue-form\"  target=\"_blank\"\u003e\n  \u003cdiv class=\"revue-form-group\"\u003e\n    \u003clabel for=\"member_email\"\u003eEmail address\u003c/label\u003e\n    \u003cinput class=\"revue-form-field\" placeholder=\"Your email address...\" type=\"email\" name=\"member[email]\" id=\"member_email\"\u003e\n  \u003c/div\u003e\n  \u003cdiv class=\"revue-form-group\"\u003e\n    \u003clabel for=\"member_first_name\"\u003eFirst name \u003cspan class=\"optional\"\u003e(Optional)\u003c/span\u003e\u003c/label\u003e\n    \u003cinput class=\"revue-form-field\" placeholder=\"First name... (Optional)\" type=\"text\" name=\"member[first_name]\" id=\"member_first_name\"\u003e\n  \u003c/div\u003e\n  \u003cdiv class=\"revue-form-group\"\u003e\n    \u003clabel for=\"member_last_name\"\u003eLast name \u003cspan class=\"optional\"\u003e(Optional)\u003c/span\u003e\u003c/label\u003e\n    \u003cinput class=\"revue-form-field\" placeholder=\"Last name... (Optional)\" type=\"text\" name=\"member[last_name]\" id=\"member_last_name\"\u003e\n  \u003c/div\u003e\n  \u003cdiv class=\"revue-form-actions\"\u003e\n    \u003cinput type=\"submit\" value=\"Subscribe\" name=\"member[subscribe]\" id=\"member_submit\"\u003e\n  \u003c/div\u003e\n  \u003cdiv class=\"revue-form-footer\"\u003eBy subscribing, you agree with Revue’s \u003ca target=\"_blank\" href=\"https://www.getrevue.co/terms\"\u003eTerms of Service\u003c/a\u003e and \u003ca target=\"_blank\" href=\"https://www.getrevue.co/privacy\"\u003ePrivacy Policy\u003c/a\u003e.\u003c/div\u003e\n  \u003c/form\u003e\n\u003c/div\u003e\n\u003chr\u003e\n\u003cp\u003eIf you are unsure, here are some editions that my readers loved.\u003c/p\u003e","title":"Next — Today I learnt About R"},{"content":"It is sometime around 4000 BC, and almost all of India is on the verge of war.1 It is a war between two princes but is also the war between right and wrong \u0026mdash; dharma and adharma. Pandavas, five righteous brothers led by Yudhistir, ask king Dhritrashtra for their fair share of land. Kauravas, led by prince Duryodhan, were unwilling to share an inch of land.2\nDhritrashtra is lost in love for his hundred sons (Kauravas) and doesn\u0026rsquo;t bear the courage to not bow to the whims of the eldest one, Duryodhan.\nTherefore, a war ensues.\nOn the first day of the war, before the shankh is cooed to announce the start, Arjuna asks his charioteer Krishna, who is god himself, for a tour of the battlefield. Krishna obliges and takes him to the middle of the field, what the modern world would call no-man\u0026rsquo;s-land. When Arjuna sees the vast land of Kurukshetra full of warriors prepared to take lives, with his uncles, brothers and friends fighting against him, he is unsettled.\nHe tells Krishna that this act of attacking his brothers, friends and gurus sounds like adharma in itself. How can I justify killing a million people only for a piece of land? And even if I win, what would I do with this land? I won\u0026rsquo;t have anyone I love to share it with. Saying this, he gives up his bow and says I can\u0026rsquo;t fight it. It is not right; it\u0026rsquo;s not dharma.\nHe announces he will leave everything \u0026mdash; the war, the family, the palace, everything \u0026mdash; to live like a sage in the jungle at peace, far away from the war.\nArjuna\u0026rsquo;s inaction brings the entire war to a grinding halt.\nTo convince Arjun not to abandon his duty, Krishna begins with the most obvious tactic. He says: stop being a wimp. You are a Kshatriya (warrior), and you are from kuru-kul (the family of greatest kings in all of Bharatwarsha). If you refuse to fight, all you would earn is shame.\nHe adds: everyone including you, my dear Parth, do not have to worry.3 The body you\u0026rsquo;re residing in is nothing but a dress for our souls. What do we have to lose but our clothes?\nThe true self is invulnerable. Swords don\u0026rsquo;t cut him, fire doesn\u0026rsquo;t burn him, the wind doesn\u0026rsquo;t blow him, water doesn\u0026rsquo;t wet him, and arrows don\u0026rsquo;t pierce him.\nA true Kshatriya should fight, being unguided by the thought of winning or losing, but because he has to fight. No one should act by keeping their eyes on the fruits of the action.\nThis last comment sets the stage for the rest of the song of the lord.\nArjuna is confused. Why strive to act at all if not to achieve some purpose? If the fruits of action aren\u0026rsquo;t relevant to my choice of action, why should I act at all? Why not become a renouncer?4\nKrishna puts a succinct reply. You cannot not act. Complete inaction is an impossibility. The forces of nature will ensure you act; it\u0026rsquo;s not your choice. You will breathe, and you will eat. Even if you make a point not to do either, your inaction will result in action.\nHe who renounces the fruits of the acts is the true renouncer. No one who has a body can indeed renounce action. He who renounces the fruits of the acts is the true renouncer. Inaction, too, could lead to fruits\u0026mdash;sins of omission.\nConsider the millions of soldiers fighting for you. Do you think their wives and mothers will accept your inaction? If your inaction \u0026mdash; the choice of not fighting in the war \u0026mdash; will result in their deaths, how is that an inaction?\nBut Arjun remains confused. What should motivate him to do anything if he should not look for the fruits of the action?\nKrishna\u0026rsquo;s answer is philosophically self evident. Think of me. Be truly devoted to me. He is the origin of the universe and the end of it. He is the creator of the world; he is the destroyer of it.\nKrishna is the eighth incarnation of Lord Vishnu, and Ramadhari Singh Dinkar best describes him in Rashmirathi.5\nहरि ने भीषण हुंकार किया, अपना स्वरूप-विस्तार किया, डगमग-डगमग दिग्गज डोले, भगवान् कुपित होकर बोले- \u0026lsquo;जंजीर बढ़ा कर साध मुझे, हाँ, हाँ दुर्योधन! बाँध मुझे।\nयह देख, गगन मुझमें लय है, यह देख, पवन मुझमें लय है, मुझमें विलीन झंकार सकल, मुझमें लय है संसार सकल। अमरत्व फूलता है मुझमें, संहार झूलता है मुझमें।\n\u0026lsquo;उदयाचल मेरा दीप्त भाल, भूमंडल वक्षस्थल विशाल, भुज परिधि-बन्ध को घेरे हैं, मैनाक-मेरु पग मेरे हैं। दिपते जो ग्रह नक्षत्र निकर, सब हैं मेरे मुख के अन्दर।\n\u0026lsquo;दृग हों तो दृश्य अकाण्ड देख, मुझमें सारा ब्रह्माण्ड देख, चर-अचर जीव, जग, क्षर-अक्षर, नश्वर मनुष्य सुरजाति अमर। शत कोटि सूर्य, शत कोटि चन्द्र, शत कोटि सरित, सर, सिन्धु मन्द्र।\n\u0026lsquo;शत कोटि विष्णु, ब्रह्मा, महेश, शत कोटि विष्णु जलपति, धनेश, शत कोटि रुद्र, शत कोटि काल, शत कोटि दण्डधर लोकपाल। जञ्जीर बढ़ाकर साध इन्हें, हाँ-हाँ दुर्योधन! बाँध इन्हें।\nHow do you devote yourself to me? Yoga. \u0026ldquo;But then, what is the true devotion?\u0026rdquo; asks Arjun. Krishna says that to be devoted to one\u0026rsquo;s karma is to be devoted to me. How do you devote yourself to me? Yoga.\nKrishna explains that there are three types of yoga.6 First is Bhakti-yoga, or the yoga of devotion. You can devote yourself to my worship, praising the Lord, the creator, the destroyer, the nature and the world.\nSecond is Gyaan-yoga, when you leave everything else and focus on learning yourself. Knowledge of self is the ultimate knowledge. The third is Karma-yoga, when you focus on your actions.\nHe adds:\nEvery human has three powers: the power to do things (बल), the power to know things (ज्ञान) and the power to believe (विश्वास). Those with a strong desire to do things are Karma-yogi, those with an intense curiosity to know oneself are Gyaan-yogi, and those willing to give up everything for the devotion of god are Bhakti-yogi.\nAll three routes lead to me. Any other path to me is also one of these three.\nI created the world to keep the wheel turning. Whether you like it or not, you have no choice, and the wheel will turn. What are you afraid of?\nBut, of course, all of this doesn\u0026rsquo;t quite convince Arjun.\nArjuna, you are going to fight; you might as well accept it and move on. The Krishna tells him to look around and Arjun notices how everything is paused. Krishna explains that the narrative won\u0026rsquo;t resume unless you pick your bow and choose to fight. The war, however destructive it may turn out to be and will likely be, is destined to take place, and Arjun is destined to play his role in it.\nThe events are foreordained and will happen as they should. From Krishna\u0026rsquo;s vantage point, the clues to shape the future are already present today. A negligent mind won\u0026rsquo;t realise it; a yogi won\u0026rsquo;t miss it. The events will occur as they must occur.\nI am death, destroyer of the worlds.7 Arjuna, you are going to fight; you might as well accept it and move on. Your actions don\u0026rsquo;t need motivation but devotion. You should do it because you have to do it, not because you\u0026rsquo;re awaiting its result.\nप्रकृति (Prakriti, or nature) and पुरुष (Purusha, or observer self) are two dials of the world. The doctrine of non-attached action can be explained by how a tortoise pulls himself within his shell, just like a Karma-yogi pulls one\u0026rsquo;s पुरुष towards प्रकृति to understand oneself and perform actions by thinking about nature.8\nBy practising yoga, Arjun, you can learn about yourself \u0026mdash; your Purusha. At that stage, your understanding of Purusha will overpower the illusion of Prakriti. You will learn to see things as they are, not as your body wants you to believe.\nSince your action\u0026rsquo;s reactions unfold in this Prakriti, they cannot touch your purusha \u0026mdash; your true self. It is untouched, unhindered and unstoppable.\nThis teaching, Krishna adds, is a Raj-vidya. It is for those who deserve it. Millions of years ago, I narrated this to the sun god; some berries of this tree made their way to Vedas. Now, you must learn, understand and grow.\nTherefore, Arjuna, do the thing because you ought to.\nMaharaja Udipi chose not to fight on either side. Instead, he decided to cook food for all soldiers. No one knew how many warriors would die every day, except apparently Udipi, who got the estimate right every time. After the war, Prince Yudhisthir asked him how he got the perfect forecast every day.\nMaharaja Udipi explained that he gave counted peanuts to Lord Krishna every evening. Based on however many peanuts he left, he estimated the number of soldiers who would attain vir gati (or warrior\u0026rsquo;s death). If Sri Krishna left five peanuts, 5000 soldiers would die the next day. If he left 50 peanuts, it told 50,000 soldiers would die the next day.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nAs Bhagwat Gita later says, all wars are for one of the three necessities: money, land or woman (or man).\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nParth is another name of Arjun, given to him by Krishna. Obviously, I love this name \u0026mdash; perhaps more than Arjun itself.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nBeing a renouncer was considered the best option by many, including Ajivikas and Jains. Many Jains, for example, believed that the proper way to break from the cycle of life and death, the cycle of rebirth, was to rescind from life and take samadhi, a state where one stopped eating, drinking and breathing only to die on one\u0026rsquo;s own will.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nD.N. Pandey sir, my high school maths teacher, used to recite this poem while he gave us math problems to solve. The rigor with which I approached the problem definitely improved after listening to his energetic voice. Manoj Bajpayee\u0026rsquo;s recitation is probably the most energetic one I\u0026rsquo;ve listened to. You can listen it here with English subtitles.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nYoga stands for the union of thoughts, body and nature. Unfortunately, goat yoga doesn\u0026rsquo;t fall in either three categories.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThis is the phrase from Gita that Oppenheimer quoted when he created the world\u0026rsquo;s first nuclear bomb.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nIn Vedanta philosophy, प्रकृति is the prime material of which the world is made up of. All matter is part of प्रकृति.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/the-song-of-the-lord/","summary":"Shrimad Bhagavad Gita, or the song of the lord, is a 700 verses long conversation between Lord Sri Krishna and Arjun. It discusses key principles of action and embodies more wisdom than I can grasp.","title":"The Song of The Lord"},{"content":"I had always wondered about the ubiquity of unhealthy food in the US and the blatant absence of fresh fruits and vegetables. Why are supermarkets full of unhealthy chips, why does bread have corn syrup, and why do berries cost $9.99 and berries snacks cost $1.99!?\nThe answer might be misaligned incentives. You see, the farmers in the Great Depression-era were poor and helpless; had a regular tryst with droughts like every other country’s farmers. And most Americans were farmers. So, poor and vulnerable. The US government supported them with three initiatives. First, subsidies. The USDA found some easy to grow crops like wheat and corn, and helped farmers economically to develop them.\nAmazing help! The farm output in US has nearly tripled since 1948.1 This actually brought the US out of malnutrition and even saved countries like India when we had regular droughts and famines in the 1960s and 70s.2 In fact, America is the largest agriculture exporter in the world!\nSecond, they also supported broad agriculture research, which helped create HYV seeds (but also Monsanto’s cancer-inducing chemicals).3\nHowever, the farmers clearly say that the chemical is not going anywhere.4\nMr. Bensend has been using that product, Roundup, on his 5,000 acres for 40 years, but he said that those blockbuster awards would not alter his farm practices one whit. Neither would the 20,000 lawsuits still pending.\n“Roundup is still a fabulous tool,” said Mr. Bensend, who grows corn, soybeans and alfalfa. He relies on Roundup’s key ingredient — glyphosate — to easily kill weeds, helping increase his yields and reduce his costs.\nThird, which I don’t have much to talk about, was buying surplus produce from farmers.\nToday, the farmers are neither poor nor helpless. Less than 1% of Americans are farmers but the subsidies are still in place.5 It doesn’t look like they’re going anywhere.\nAlmost half of the subsidies go to the top-7 largest farm corporations in the US.\nWhen you subsidise something, you get a lot of it.\nIn the 1990s, the USDA was tasked to create a food pyramid that would be printed in kids’ textbooks. What better way to tell them to be healthy? Unfortunately, the fantastic initiative was hijacked by lobbyists. The proportion of fruits and vegetables was reduced significantly to give space to grains and meat, according to Lusie Light, former USDA Director of Dietary Guidance and Nutrition Education Research.6\nWhere we, the USDA nutritionists, called for a base of 5-9 servings of fresh fruits and vegetables a day, it was replaced with a paltry 2-3 servings (changed to 5-7 servings a couple of years later because an anti-cancer campaign by another government agency, the National Cancer Institute, forced the USDA to adopt the higher standard). Our recommendation of 3-4 daily servings of whole-grain breads and cereals was changed to a whopping 6-11 servings forming the base of the Food Pyramid as a concession to the processed wheat and corn industries.\nMoreover, my nutritionist group had placed baked goods made with white flour — including crackers, sweets and other low-nutrient foods laden with sugars and fats — at the peak of the pyramid, recommending that they be eaten sparingly. To our alarm, in the “revised” Food Guide, they were now made part of the Pyramid’s base. And, in yet one more assault on dietary logic, changes were made to the wording of the dietary guidelines from “eat less” to “avoid too much,” giving a nod to the processed-food industry interests by not limiting highly profitable “fun foods” (junk foods by any other name) that might affect the bottom line of food companies.\nUSDA made a marketing division to work with fast food companies (among others) to support them. The dairy industry wanted a piece too. Soon, there was cheese everywhere. Everywhere.\nBut how much can you plaster a pizza with cheese? It’s already dipping. When they couldn’t find more places to put cheese, the US government invented the cheese-filled crust.7\nView tweet on Twitter Today, unlike other corners of fast food industry, pizza isn’t considered as unhealthy when it actually is. In fact, the US government spends millions to get people to eat more pizza.8\nFor the first few months I was in the US, I loved easy and accessible fast food. But soon after, I realised changes in my body. I was lazier in general and felt lethargic. Since then, I started cooking more often and I do notice changes in my energy levels. The vegetables here aren’t as delicious as in India — ask an Indian about mangoes — but at least that’s better than the fast food junk that makes every two out of three Americans fat.9\nWhen the government makes the choices, we have to wonder if the options benefit us or someone else. In this case, it clearly doesn’t help us. As someone who likes vegetables and fruits, and would prefer to be healthy, can we do something?\nI was actually inspired to write this blog after watching Adam Conover’s Netflix documentary series (produced by former president Barak Obama). It’s an amazing series, do watch it if you can.\nA Look at Agricultural Productivity Growth in the United States, 1948–2017. (2020, March 5). USDA. https://www.usda.gov/media/blog/2020/03/05/look-agricultural-productivity-growth-united-states-1948-2017\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nI first learnt about this in my development economics class. We regularly requested US help for food grains. When HYV seeds became popular, this stopped but it wasn’t until 2000s that we became self-sufficient.\nThe New York Times (1974, September 3). INDIA REQUESTING FOOD AID FROM U.S. The New York Times. https://www.nytimes.com/1974/09/03/archives/india-requesting-food-aid-from-us-seeks-emergency-help-but-shuns-a.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThe World According To Monsanto is an amazing documentary recalling the practices of Monsanto and how it hurt us and farmers.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nCohen, Patricia. “Roundup Weedkiller Is Blamed for Cancers, but Farmers Say It’s Not Going Away (Published 2019).” The New York Times, 20 Sept. 2019, https://www.nytimes.com/2019/09/20/business/bayer-roundup.html.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nIn international discussions, the US government regularly points out that developing countries like India reduce the number of subsidies given to farmers. But seriously? Most farmers in India are small — like US farmers in the Great Depression-era — and they need it.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nA Fatally Flawed Food Guide by Luise Light. http://www.whale.to/a/light.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nRainey, Clint. “The Mad Cheese Scientists Fighting to Save the Dairy Industry.” Bloomberg.com, Bloomberg, https://www.bloomberg.com/news/features/2017-07-19/the-mad-cheese-scientists-fighting-to-save-the-dairy-industry#xj4y7vzkg.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nPlumer, B. (2021, November 25). How the U.S. government spends millions to get people to eat more pizza. The Washington Post. Retrieved June 5, 2022, from https://www.washingtonpost.com/news/wonk/wp/2014/02/10/13-percent-of-americans-are-eating-pizza-on-any-given-day/\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nSifferlin, A. (2015, June 22). More Than Two Thirds of Americans Are Overweight or Obese. Time. Retrieved June 5, 2022, from https://time.com/3929990/americans-overweight-obese/\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/food-choices-in-america/","summary":"I had always wondered about the ubiquity of unhealthy food in the US and the blatant absence of fresh fruits and vegetables. Why are our supermarkets full of unhealthy chips, why does bread have corn syrup, and why do berries cost $9.99 and berries snacks cost $1.99!?","title":"Food Choices in America"},{"content":"माता गांधारी का मानना है की महाभारत मैंने करवाया। सिर्फ अपने अहम् के लिए। माँ क्या गलत हो सकती है?\nमैंने तो दुर्योधन को समझाने की कोशिश की। \u0026ldquo;दो न्याय अगर तो आधा दो, उसमें भी यदि बाधा हो तो देदो केवल पाँच ग्राम , रखो अपनी धरती तमाम\u0026rdquo;. मगर दुर्योधन में इतनी समझ कहाँ थी. महाराज धृतराष्ट्र की भरी सभा में, पितामाह भीष्म की उपास्थि में, आचार्य द्रोण और कृप के अंचल में, दुर्योधन मुझे ही बाँधने चला? क्या पूरी सभा यह भी भूल गयी की दूत निष्पक्ष होता है?\nशायद माता चौसर का खेल याद कर रही है। मगर मैं तो उस शाम था ही नहीं अन्यथा पांडवों और पांचाली के साथ ये अन्याय मैं होने ही नहीं देता। अगर माँ यह देख रही होती तो क्या वो यज्ञसेनी की चीरहरण होने देती? इस सवाल का जवाब सिर्फ काल के पास है, आखिर महाराज धृतराष्ट्र और पितामाह भीष्म ने भी तो विकर्ण को नहीं रोका। मैंने केवल द्रौपदी की रही-सही सम्मान बचायी। वो मैं आज भी करूँगा, कल भी करता, कल भी करूँगा।\nपांचाली अंगराज कर्ण के प्रति आकर्षित थी, मैंने उसे अर्जुन से मिलाया। क्यों? अर्जुन श्रेष्ठ धनुर्धर है। अर्जुन श्रेष्ठ योद्धा है। अर्जुन मेरा प्रिय है। वैसे भी, मेरे चाहने या बहलाने से क्या होता है? स्वयंवर अर्जुन जीता, कर्ण तो सूत-पुत्र था. फिर भी, इसमें मेरी क्या गलती माँ? मैं तो मेरी पांचाली के लिए अपना प्रिय ही खोजूंगा ना।\nशायद माँ युद्ध की बात कर रही. युद्ध का प्रथम कारण आपका पुत्र दुर्योधन है, इसमें मेरा क्या दोष? क्या मामा शकुनी ने कुछ कम छल किये हैं? क्या आपको अपने अनुज की वंचना नहीं दिखती?\nऐसा न करो माँ, मैं भी तुम्हारा पुत्र हूँ।\n","permalink":"/%E0%A4%A7%E0%A4%B0%E0%A5%8D%E0%A4%AE%E0%A4%95%E0%A5%8D%E0%A4%B7%E0%A5%87%E0%A4%A4%E0%A5%8D%E0%A4%B0-%E0%A4%AE%E0%A4%BE%E0%A4%A4%E0%A4%BE-%E0%A4%97%E0%A4%BE%E0%A4%82%E0%A4%A7%E0%A4%BE%E0%A4%B0%E0%A5%80-%E0%A4%95%E0%A4%BE-%E0%A4%86%E0%A4%B0%E0%A5%8B%E0%A4%AA/","summary":"माता गांधारी का मानना है की महाभारत श्रीकृष्ण ने करवाया। सिर्फ अपने अहम् के लिए।","title":"धर्मक्षेत्र: माता गांधारी का आरोप"},{"content":"Over the last few days, I dabbled with maps in R. Two days ago, I made a map of all the cities I\u0026rsquo;ve visited. Today, I thought to make street maps of some of them (and other cool cities).\nThe dark grey lines are highways and roadways, light grey lines are other streets and blue is water.\nJhumri Tilaiya, India Indore, India Riga, Latvia Knoxville, United States New Delhi, India Sydney, Australia New York, United States Boston, United States London, United Kingdom The function to generate these is not complicated. For a detailed tutorial, see this tutorial. Here\u0026rsquo;s the function that I used.\nlibrary(tidyverse) library(osmdata) city_mapper = function(city) { lts = getbb(city) streets = getbb(city)%\u0026gt;% opq()%\u0026gt;% add_osm_feature(key = \u0026#34;highway\u0026#34;, value = c(\u0026#34;motorway\u0026#34;, \u0026#34;primary\u0026#34;, \u0026#34;secondary\u0026#34;, \u0026#34;tertiary\u0026#34;)) %\u0026gt;% osmdata_sf() small_streets = getbb(city)%\u0026gt;% opq()%\u0026gt;% add_osm_feature(key = \u0026#34;highway\u0026#34;, value = c(\u0026#34;residential\u0026#34;, \u0026#34;living_street\u0026#34;, \u0026#34;unclassified\u0026#34;, \u0026#34;service\u0026#34;, \u0026#34;footway\u0026#34;)) %\u0026gt;% osmdata_sf() river = getbb(city)%\u0026gt;% opq()%\u0026gt;% add_osm_feature(key = \u0026#34;waterway\u0026#34;, value = \u0026#34;river\u0026#34;) %\u0026gt;% osmdata_sf() p = ggplot() + geom_sf(data = streets$osm_lines, inherit.aes = FALSE, color = \u0026#34;#282828\u0026#34;, #3C280D size = .5, alpha = .7) + geom_sf(data = small_streets$osm_lines, inherit.aes = FALSE, color = \u0026#34;#909090\u0026#34;, #795C34 size = .4, alpha = .4) + geom_sf(data = river$osm_lines, inherit.aes = FALSE, color = \u0026#34;#03026F\u0026#34;, size = .7, alpha = .8) + coord_sf(xlim = c(lts[1], lts[3]), ylim = c(lts[2], lts[4]), expand = FALSE) + theme_void() + labs(caption = \u0026#34;Learn more: harsh17.in/city-maps\u0026#34;) return(p) } ","permalink":"/city-maps/","summary":"Over the last few days, I dabbled with maps in R. Two days ago, I made a map of all the cities I\u0026rsquo;ve visited. Today, I thought to make street maps of some of them (and other cool cities).","title":"Street Maps (of Some Cities)"},{"content":"Here are some interesting stats about my website. I would probably do this every year in the last week of May.\nAll of these stats are for last three months, except the last one. The number of users is from 2019 to 2022.\nI had 542 visitors and I was meeting 491 of them for the first time. This is encouraging to know. This means my website is gaining audience rapidly. They also visited four pages on average.\nMost visitors were from US, almost 40% Second most number of visitors were from India. Germany as third is surprising.\nWhere\u0026rsquo;s Ashburn? Most visitors are from Ashburn. Where is that? Ashburn, Virginia? Ashburn, Georgia? And why are people from Ashburn so interested in my website. I\u0026rsquo;m curious\u0026hellip;\nSurprise, surprise: Chrome is the most popular browser Safari is second. (I love Safari, especially how beautiful it is. I do wish it handled extensions better to be honest.)\nReferral is getting me most visitors My email signature is not that helpful, I guess. Organic search is great; Google\u0026rsquo;s actually ranking me high enough to have an impact \u0026mdash; at least in some cases. The big daddy of all are Referrals. I want to see who are referring visitors to my website.\nPractically, this is r-bloggers.com alone.\nSurprisingly, my Hindi blogpost is the most visited one in last one month The number of users has consistently increased over time ","permalink":"/some-website-stats/","summary":"This is my digital garden. Here is it\u0026rsquo;s report card via Google Analytics.","title":"Some Website Stats"},{"content":"जब मैं छोटा था, शिवमंगल सिंह सुमन की ये कविता मेरे बड़े करीब थी। \u0026ldquo;कनक-तीलियों से टकराकर, पुलकित पंख टूट जाऍंगे\u0026rdquo; मुझे आज भी झकझोर कर रख देता है। छोटी आशाओं को पूरा करने की कोशिश में हम कब सोने के पिंजरे में कैद हो जाएंगे, हमें एहसास भी नहीं होगा। लेकिन मैं पंक्षी हूँ उन्मुक्त गगन का, या मेरी सांसों की डोरी तनेगी, या मैं अकुल उड़ान करूँगा।\nहम पंछी उन्मुक्त गगन के\nपिंजरबद्ध न गा पाऍंगे\nकनक-तीलियों से टकराकर\nपुलकित पंख टूट जाऍंगे ।\nहम बहता जल पीनेवाले\nमर जाऍंगे भूखे-प्यासे\nकहीं भली है कटुक निबोरी\nकनक-कटोरी की मैदा से ।\nस्वर्ण-श्रृंखला के बंधन में\nअपनी गति, उड़ान सब भूले\nबस सपनों में देख रहे हैं\nतरू की फुनगी पर के झूले ।\nऐसे थे अरमान कि उड़ते\nनील गगन की सीमा पाने\nलाल किरण-सी चोंच खोल\nचुगते तारक-अनार के दाने ।\nहोती सीमाहीन क्षितिज से\nइन पंखों की होड़ा-होड़ी\nया तो क्षितिज मिलन बन जाता\nया तनती सॉंसों की डोरी ।\nनीड़ न दो, चाहे टहनी का\nआश्रय छिन्न-भिन्न कर डालो\nलेकिन पंख दिए हैं तो\nआकुल उड़ान में विघ्न न डालो ।\n","permalink":"/hum-panchhi-unmukt-gagan-ke/","summary":"शिवमंगल सिंह सुमन की कविता","title":"हम पंछी उन्मुक्त गगन के"},{"content":"It is essential to keep experimenting with new things in life. We don\u0026rsquo;t know what would stick and be successful; we can only take guesses. More often than not, we tend to be risk averse because we don\u0026rsquo;t know enough. However, being a little more optimistic pays off in the long term.\nWriting your ideas is essential. Your ideas are your baby and you\u0026rsquo;ll protect them from all the harm, without ever testing them if they\u0026rsquo;re right. Writing forces you to think critically. You need to write not because you have an assignment due next week but because you need to think.1 Thinking makes you act formidably.\nA simple way I started keeping track of my actions (or inactions, depending on how you see them) was to jot them down in my notes app. In fact, I am writing this monologue in my notes.2 But once I have a formidable idea on what I\u0026rsquo;m aiming at, I can easily act on them.\nEven if the ideas I noted down aren\u0026rsquo;t actionable \u0026mdash; and in most cases they are not \u0026mdash; writing them down gives me a clarity of thought. Prof Sean often says how we can imagine things in our mind without clarity and only when we write them down we can find how much we think we understand.\nConsider the example of my newsletter, Next. One fine evening, I was chilling with my roommates on our front porch and this idea came to my mind: writing a #rstats newsletter with five stories, four packages, three jargons, two tweets and one meme. I noted it down. Almost all the details I could think of at that time.\nThese notes in isolation do not make sense. But these give me a good grasp of how things would pan out if I actually started with the newsletter.\nA few weeks later when I actually started writing the newsletter, this information became the most important thing I referred.\nI also experimented with different methods to take note. I tried Notion, but it was far too organised. I felt like I was spending more time organizing what I was writing than the time I spent actually writing. Notes app gives me a simple method to search and that is how I access most information anyway.3\nMany fall into the trap of writing with fancy words. While occasionally they\u0026rsquo;re great, mostly they\u0026rsquo;re come out as pedantic. Writing with superfluous words makes our point elusive to readers. Furthermore, if you\u0026rsquo;re writing in English, there\u0026rsquo;s a high chance your reader\u0026rsquo;s first language might not be English.\nFancy writing also conceals the lack of ideas. People with obscure ideas (read: Lawyers) often use fancy words that are designed to confuse the readers. Why else would \u0026ldquo;and/or\u0026rdquo; be a thing?\nWriting about an idea, even about an idea that you thought you knew well, usually shows how little you understood about that idea. Paul Graham has amazing essays4 on why we should write and how to get started. I\u0026rsquo;ll just pull my favourite paragraph from there.\nAs for how to write well, here\u0026rsquo;s the short version: Write a bad version 1 as fast as you can; rewrite it over and over; cut out everything unnecessary; write in a conversational tone; develop a nose for bad writing, so you can see and fix it in yours; imitate writers you like; if you can\u0026rsquo;t get started, tell someone what you plan to write about, then write down what you said; expect 80% of the ideas in an essay to happen after you start writing it, and 50% of those you start with to be wrong; be confident enough to cut; have friends you trust read your stuff and tell you which bits are confusing or drag; don\u0026rsquo;t (always) make detailed outlines; mull ideas over for a few days before writing; carry a small notebook or scrap paper with you; start writing when you think of the first sentence; if a deadline forces you to start before that, just say the most important sentence first; write about stuff you like; don\u0026rsquo;t try to sound impressive; don\u0026rsquo;t hesitate to change the topic on the fly; use footnotes to contain digressions; use anaphora to knit sentences together; read your essays out loud to see (a) where you stumble over awkward phrases and (b) which bits are boring (the paragraphs you dread reading); try to tell the reader something new and useful; work in fairly big quanta of time; when you restart, begin by rereading what you have so far; when you finish, leave yourself something easy to start with; accumulate notes for topics you plan to cover at the bottom of the file; don\u0026rsquo;t feel obliged to cover any of them; write for a reader who won\u0026rsquo;t read the essay as carefully as you do, just as pop songs are designed to sound ok on crappy car radios; if you say anything mistaken, fix it immediately; ask friends which sentence you\u0026rsquo;ll regret most; go back and tone down harsh remarks; publish stuff online, because an audience makes you write more, and thus generate more ideas; print out drafts instead of just looking at them on the screen; use simple, germanic words; learn to distinguish surprises from digressions; learn to recognize the approach of an ending, and when one appears, grab it.\nI\u0026rsquo;ll conclude by highlighting what Prof Sean says it on his website:\nWe write for two reasons. First, to document for others what we have done. Second, to prove to ourselves that we understand what we think we understand.\nJordan Peterson\u0026rsquo;s guide to writing an essay is a helpful guide if you\u0026rsquo;re lost at where to start persuasive writing. Access it here.\nSometime in my first year of undergrad, there was a creative writing exam. I got a C-. I decided to improve my writing and met Prof Dibyaduti Roy for specific feedback. He told me to practice more writing. This website (and it\u0026rsquo;s predecessors) are somewhat a product of that feedback. Since then, my writing has certainly improved \u0026mdash; although nowhere close to how I\u0026rsquo;d like.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nNow part of the blog.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nPopularity of search engines has made students oblivious to directories and folder-file organization.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nWriting Briefly. Putting Ideas into Words.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/trying-new-things/","summary":"It is essential to keep experimenting with new things in life. We don\u0026rsquo;t know what would stick and be successful; we can only take guesses. Take notes; that\u0026rsquo;s the only way to keep a log.","title":"Trying New Things"},{"content":"A university\u0026rsquo;s website tells a lot about it. Harvard \u0026mdash; like all things in education and research \u0026mdash; is a prime example.\nToday\u0026rsquo;s topic is Mental Health. On any given day, it would be something else \u0026mdash; racial bias, COVID-19, sports, education, literally anything. They would put together a bunch of research done by their faculty on that that topic, articles published by The Crimson or any other such publication.\nWhen there can be internal resources that can be linked, they\u0026rsquo;ll do that. Notice how they added a link to their libraries by calling out \u0026ldquo;Library resources to support mental health\u0026rdquo;.\nSuch an exposé shows what Harvard\u0026rsquo;s interested in. For people looking for random nuggets online, it is a gold mine.\nOn the other hand, consider MIT\u0026rsquo;s website.\nThis is again cool. On one side, they provide a good and useful search engine for any specific information you could be looking for. The other side shows cool research that MIT researchers have been working on.\nTennessee\u0026rsquo;s website is very audience focussed. They aim to have great undergraduate programmes and thus attract the best undergraduate students. Research is not the center of the world. It is lost in a hyperlink about \u0026ldquo;Dig deeper with research opportunities\u0026rdquo;.\nAt the essence, Tennessee is trying to sell its university \u0026mdash; we are great, we help our students, we have great sports teams, come visit us! This is notably different from Harvard. Harvard ignores its own existence; rather, it focuses on the ideas. Ideas that its researchers are working on, or ideas that are exciting and relevant.\nSome day, I hope to revamp my website again along the lines of Harvard \u0026mdash; showcase ideas instead of reiterating facts. That will require significant efforts and I will have to move to Wix or something. But, I will do it. Probably this summer? Let\u0026rsquo;s see.\n","permalink":"/harvard-website/","summary":"A university\u0026rsquo;s website tells a lot about it. Harvard — like all things in education and research — is a prime example.","title":"Harvard's Website"},{"content":"For details, read the complete report. This project was part of Prof Wenjun Zhou\u0026rsquo;s Machine Learning class at University of Tennessee.\nWhat is it? Most clustering algorithms work for numerical variables where the variables are assumed to be continuous and random. In this short monograph, I proposed a probability-based distance measure for computing dissimilarity between observations for discrete variables thought to be randomly distributed. As their probabilities are derived empirically, there is no underlying assumption on their distribution.\nHow do I do it? Consider two discrete random variables \\(X_1\\) with \\(u\\) different classes and \\(X_2\\) with \\(v\\) different classes. Let \\(\\{c_{11}, c_{12}, \\dots, c_{1u} \\}\\) be the set of different classes of \\(X_1\\). Similarly, let \\(\\{ c_{21}, c_{22}, \\dots, c_{2v}\\}\\) be the set of different classes of \\(X_2\\). The empirical probability of event \\(X_1 = c_i\\) is \\(\\frac{m}{n}\\), where \\(m\\) is the frequency of \\(c_i\\) observed in \\(X_1\\) and \\(n\\) is the total number of observations.\nAssuming that the sample is representative of the population, we can calculate the empirical probability of each class for each variable. Once we have those probabilities, we can calculate the joint probability for an observation that I call \u0026ldquo;score\u0026rdquo;. This score is a number between 0 and 1.\nInterpretation The score of zero is asymptotically possible but impossible in real-world analysis. If the researcher assumes no prior knowledge about the variable, only the existing classes observed in the data can be used as a possible class. In that case, the score cannot be zero for any observation. However, if the researcher assigns a non-zero probability to a class that wasn\u0026rsquo;t observed in real data, we can have zero probability for some classes. A score of one is possible only when all observations are precisely the same.\nIn most general cases, the value for each observation would lie between zero and one. The closer the values are to each other, the closer they are to each other (although this is not guaranteed, as we will see in the following example.)\nPros and Cons The proposed method is amazing if we do not assume any prior probabilistic distribution for the variables. Since it relies on empirical distribution, it estimates the class probability for a discrete variable only based on available observations. However, this benefit comes at a (potential) cost. A biased sample will significantly affect the empirical probability and thus the score. It may not be reliable in such cases.\nIt is also possible that this method will lead to combinatorial explosion and thus very small values of the score. When calculating the empirical probabilities, we will typically have small values \u0026mdash; less than 0.3 if there are three classes, say. If there are five such variables, the \u0026ldquo;average\u0026rdquo; score would be \\(0.3^5 = 0.00243\\), which is very small.\nThis limitation has an easy fix. We could easily scale the score by multiplying it by a large \\(C\\) to bring it on the same scale as the rest of the variables. This will ensure that the clustering algorithm doesn\u0026rsquo;t penalise this variable for a small default value.\nQuick Example Let me illustrate the method with a small example. Consider the following data with three discrete variables and no continuous variable.\nThe variables have different probability distributions. The probability of being a Male is \\(2/5\\); being a Female is \\(3/5\\). The probability of the City being New York is 2/5; Shanghai, Boston or New Delhi are all equal to 1/5 each. The probability of the favourite colour being Blue is \\(2/5\\); Black, White or Red are at \\(1/3\\). Finally, being an executive is \\(3/5\\), and the probability of being a non-executive is \\(2/5\\).\nAssuming that all variables are independent of each other, the probability that a person is Male who lives in Shanghai, whose favourite colour of Blue and who is an executive is \\(2/5 * 1/5 * 2 /5 * 3/5 = 12/625 = 0.0192\\). I call this joint probability an observation\u0026rsquo;s score. We could repeat the exercise for all the observations, and we will obtain the results presented in the last column of the table.\nThis continuous measure that I call \u0026ldquo;Score\u0026rdquo; can measure dissimilarity between observations. Note that the method doesn\u0026rsquo;t guarantee a differentiable score. Even observations with which we get precisely the same can differ from one another. However, observations with very different scores would inevitably be different observations. The latter property is more critical when deciding which cluster an observation belongs to.\nSimulations In this section, I will compare the clusters found using three methods: (1) using only continuous variables, (2) using continuous variables and the score, and (3) using Gower\u0026rsquo;s distance. For the purpose of this simulation, I will use flower data available in cluster package in R.1\nResults Clusters with only continuous variables The clusters obtained from the continuous variables seem to have accounted only for V7 in differentiating between the observations. See the figure below for a scatter plot.\nClusters with continuous variables and my proposed \u0026ldquo;score\u0026rdquo; Clusters with Gower\u0026rsquo;s Distance The clusters obtained from Gower\u0026rsquo;s Distance is presented below.\nConclusion In this short monograph, I presented a new distance metric based on empirical joint probability. With a small simulation on flower data, I showed how effective it is as compared to not using categorical variables at all. I also compared the results with Gower\u0026rsquo;s distance-based clustering. I found that the results from the three methods do not match exactly. My method shows some improvement over not using categorical variables. However, Gower is a smart guy and his method performs better than my naive method. :)\nFor more details, see https://cran.r-project.org/web/packages/cluster/cluster.pdf. This dataset was first published by Struyf, Hubert and Rousseeuw (1996).\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/joint-probability-based-dissimilarity-measure-for-discrete-variables/","summary":"A Simple Method to Calculate Distance between Discrete Variable","title":"Joint Probability-based Dissimilarity Measure for Discrete Variables"},{"content":"\nOnce upon a time, there was a nail-making factory during the peak of the industrial revolution in rural England. The workers arrived there every morning at 8 am and worked till 4 pm to make nails. Just plain nails. It was a risky job: the nail sharpeners wouldn\u0026rsquo;t differentiate between the nails and fingers of the workers. On average, a worker made about 300 nails in a day \u0026mdash; with 8 hours of hard labour.\nOne day, the factory owner\u0026rsquo;s son visited the plant. Studying at Cambridge, he thought the task of making nails was menial, and he could do it better. He told his father, \u0026ldquo;Papa, these men are wasting your money. I am sure I can produce more than 300 nails in a day. I see the process, and it hardly takes them only a minute to make one.\u0026rdquo; His father replied, \u0026ldquo;I see. Why don\u0026rsquo;t you try?\u0026rdquo;.\nThe son got on a workstation and started making nails. He pulled out his shiny stopwatch to keep track of time. The first one was done in 30 seconds. The second one in 29 seconds. Third in 25 seconds. If he could continue this pace, he\u0026rsquo;d far outgrow what these workers were manufacturing. And he did.\nTwo hours later, he showed up to his father and said, \u0026ldquo;Papa, see, I have four hundred nails already. I\u0026rsquo;m already more productive than your workers\u0026rdquo;. His father smiled and said, \u0026ldquo;why don\u0026rsquo;t you try another two hours?\u0026rdquo;. He did. But he only got 300 nails. He showed up to his father and said, \u0026ldquo;See, I\u0026rsquo;m a little worse but still as productive as your daily worker\u0026rdquo;. His father told him to try again. This time, he only got a hundred nails.\nHe showed up to his father like a whimpering kid with a hanging jaw. This time, he only showed his nails and only said, \u0026ldquo;a hundred\u0026rdquo;. His father asked him to sit down and think.\nIf you can do a job super fast, ask yourself if it\u0026rsquo;s the job that\u0026rsquo;s unproductive or if your excitement to do it is making you productive. Would this excitement last a week? A month? A year? Day in and day out, if you would be only making nails, would you continue making 400 nails?\nAdded on July 24, 2022 I was listening to this podcast this morning, and the author said that grit drives most success far more than talent. She said, \u0026ldquo;effort counts twice\u0026rdquo;.\nSometimes, people think it\u0026rsquo;s all grit \u0026mdash; talent has no role. That\u0026rsquo;s not the case. Talent is the rate at which you get better at something, i.e. develop your skill. When you apply effort to get better, you grow in skills. When you don\u0026rsquo;t use effort to improve, you don\u0026rsquo;t grow in skills. If you\u0026rsquo;re talented, the same amount of effort will make them develop in skill more.1\n$$Skill \\times Effort = Achievement$$\nAnd skill itself improves with effort. So, if Skill is decomposed as Talent + Effort, then we see how Effort counts twice. Skill has to be applied to have any beneficial impact.2\nGrit has two components to it. The first one is Perseverance; the second one is passion. Perseverance drives humans to continue doing what they are doing even if they face temporary failures. We want to improve in the long term and not get distracted easily in the near term. Passion is what helps us decide the act. It dictates what the job you\u0026rsquo;re focussing on would be. Many organisations and humans, in general, tend to overvalue talent and undervalue grit.\nIt isn’t easy to get passionate about making nails. The business owner’s son had the passion for improving but not the perseverance to keep doing the grunt work. The workers had endurance but lost passion due to the monotonous nature of the work. A person successful would do a mixture of both, a balance of both.\nThis is my understanding from listening to the podcast. I didn\u0026rsquo;t verify the equation from the book.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nHarsh from MBA times would\u0026rsquo;ve marvelled at this equation. Present day Harsh finds equations like this good for communication but otherwise meaningless.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/making-nails/","summary":"Once upon a time, there was a nail-making factory during the peak of the industrial revolution in rural England. The workers arrived there every morning at 8 am and worked till 4 pm to make nails. Just plain nails. One day, the factory owner\u0026rsquo;s son visited the plant.","title":"Making Nails"},{"content":" 🧳 Map of Cities Touched ⛰️ Hikes These are only the hikes in US. My hikes in India and Europe aren’t included.\nName Location Roundtrip Length Notes Alum Cave Tennessee, US 4.4 miles Amazing views; easy hike; if you choose not to hike up to Mt. LeConte, you can return from the caves. Mt. LeConte via Alum Cave Tennessee, US 11.0 miles Great views; gets successively harder beond Alum Cave.\nThere is a lodge at the top for overnight stay.\nVirginia Creeper Trail Virginia, US 34.3 miles 10/10 recommended bike trail; you’ll get dirty, so prepare accordingly. Mouse Creek Falls / Midnight Hole North Carolina, US 4 miles Easy trail; lots of space for party. Small cliffs to dive as well. Ozone Falls Tennessee, US 1 mile One of its kind waterfall where you can go behind the waterfall. Length is short but its rocky and straight downhill so takes time. Cade’s Cove Tennessee, US 10.5 miles Famous for wildlife; I didn’t see any 🫠 Charlie’s Bunion Tennessee, US 8.0 miles Delivered what’s promised; loved it Shuckstack Fire Tower North Carolina, US 6.6 miles Uphill is tougher but the views from tower are worth it. Lacamas Creek Park Vancouver, Washington State 3.5 miles Easy hike with beautiful sceneries Dry Creek Falls Hood River County, Oregon 4.4 miles Mountainous terrain, fire-parched trees, beautiful waterfall Clingman’s Dome North Carolina, US 1.3 miles Easy walk, beautiful views. Paved roads. Highest point in Smokey Mountains. Grotto Falls, Trillium Gap Trail North Carolina, US 2.6 miles Good hike, very crowded though Norris Dam Park (River Bluff Trail) Tennessee, US 3.0 miles Alright — not great Rocky Top Tennessee, US 4 miles Flagship for UTK people, steep rocky. It is hard with okay views. Brushy Mountains, Trillium Gap Trail (via Grotto Falls) Tennessee, US 11.2 miles On Mondays you can see porters with Llamas, crowded till Grotto Falls. Cape Disappointment Trail Long Beach, Washington 1.9 miles Easy but beautiful trail passing through Lewis-Clark Observatory and Lighthouse Arches National Park Moab, Utah 30 miles Astonishing views of the arches. Make stops and get an e-bike. 🪣 Bucket List Big south fork Bald falls Max Patch Stone Mountain Rock city Chattanooga Sound of music Falls creek falls Grayson Highlands Cades Cove — biking 🚴 Bike Rides Since I got my VanMoof, I’ve been going on bike trips every so often.\nName Location Notes Baker Creek Preserve Knoxville, TN First mountain bike experience. Burnt Bridge Creek Greenway Vancouver, WA Easy and simple ride. Salmon Creek Trail Vancouver, WA Great scenery, spotted a deer. Lewis-Clark Discovery Greenway Vancouver, WA Passes through some cool architecture. Waterfront Loop Portland, OR 11/10.1 Third Creek Greenway Knoxville, TN Easy, close to UT Neyland Greenway, Cheerokee Farmway Knoxville, TN Absolutely beautiful; never crowded. 🪂Adventure Sports Sports Location Review Skydiving The Netherlands best adrenaline rush ever Rafting (still-water, white-water) Jawahar Ghat, Jharkhand\nManali, HP\ntoo hyped; waves make things hard but also more fun Kayaking Jawahar Ghat, Jharkhand\nIjams, Knoxville, TN\nLake Merwin, WA\nrelaxing if water is calm; else tiring but fun Para-sailing Jawahar Ghat, Jharkhand fun; kind of like roller coaster Para-gliding Manali, HP runway to start and landing are best parts Skiing Latvia easy to learn; difficult to master Surfing Gokarna, Karnataka nothing like it; waves hit you hard Scuba-diving Fort Dickerson Quarry, Knoxville not great unless you have clear water Mountain Biking Baker Creek Trail, Knoxville, TN amazing trails (easy, medium and hard); artificial ramps for skating and biking are great too Bungee Jumping Pacific Northwest Bridge, Amboy, WA jump is scary; everything else is fun! the high only lasts for a few seconds though Paddle-boarding Ijams, Knoxville, TN use paddle board for diving; paddle-boarding is boring by itself :) 🪣 Bucket List Cliff Jumping\nScuba with Bull sharks, Beqa Lagoon and Yasawa Islands - Fiji\nTonga, whale swimming\nMarsha Shagra, Egypt - untuned diving in Red Sea\nBonus point for Dea’s company.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/adr/","summary":"Places travelled, hikes, bike rides and adventure sports","title":"Adrenaline Activities"},{"content":" “Wealth inequality is increasing!”, “Rich people don’t pay enough taxes”, “THEY aren’t doing enough” — we have all heard these lines at some point in our life. Salaried people protesting against super-high taxes, college students protesting on behalf of everyone, and, of course, Bernie Sanders.\nHow much do the super rich really pay? Until now, the answer was unknown — thanks to privacy laws. Recently, tax returns of the super rich was leaked to ProPublica. I thought of exploring the billionaires and their tax rates.\nData I scrapped data for top 400 wealthiest individuals in US by their income reported to Federal government. Here’s the CSV file for the same.\nDownload income_taxes.csv\nThe “income” described in this article is adjusted gross income. To calculate income taxes, we used the IRS definition of “total income tax,” which excludes self-employment tax and a few other non-income taxes that appear on Form 1040. The effective income tax rates we present are weighted averages — that is, the sum of income tax from 2013 to 2018 divided by the sum of adjusted gross income over that period. We updated our figures to reflect any amended filings or audits in our database.\nAverage Tax Rates for Richest 400 Americans There’s high variance in high income earners. The general trend hints at lower income tax rates as the income increases. Note that the x-axis scale is in \\(\\log(x)\\). That means, the actual income is \\(10^x\\).\nThis is rather difficult to read. Let me separate it into a grid.\nSector-wise Average Taxes Rates for Richest 400 Americans Who Paid The Highest Tax? I calculate total taxes paid in five years as fives times average income times average effective tax rate.\nTaxes vs Income The “average” hides a lot of information. I want to compare total taxes in last eight years vs total income in last eight years.\nHighest Effective Tax Rates Table with “Name Withheld” removed. Average Tax Rates by Sector Heirs with financial business have the highest average income. People from manufacturing pay the highest federal taxes.\nCurious Case of Name Withheld I have no clue what that means. The original article gives no explanation on what that means.\n","permalink":"/billionaires-and-taxes/","summary":"How much do the super rich really pay? Until now, the answer was unknown \u0026mdash; thanks to privacy laws. Recently, tax returns of the super rich was leaked to \u003ca href=\"https://projects.propublica.org/americas-highest-incomes-and-taxes-revealed/\"\u003eProPublica\u003c/a\u003e. I thought of exploring the billionaires and their tax rates.","title":"Billionaires and Taxes"},{"content":"There are three types of arguments.\nYou can argue about stuff that happened in the past. Did you do the right thing? Was it best for everyone, especially the other person in the argument? Most often, this ends up being a blame game. Blame games are value clashes. At all costs, identify such an argument, realise if it\u0026rsquo;s a blame game and avoid it like the plague.\nYou can argue about things that are happening right now. Did we choose the right restaurant for lunch? Is this burger good? Most often, these are opinion clashes. Your preferences are likely different from the other person. Appreciate the diversity of opinions and realise it\u0026rsquo;s likely not an argument. It is a discussion.\nYou can argue about things in the future. These are choices you have to make \u0026mdash; together with the other person. You must explain your preferences, and the other person does the same. It is the only argument worth having, the one with a potentially useful result.\nIn such cases, make an effort to reach an understanding. We might still have differences \u0026mdash; fundamental value differences \u0026mdash; but strive for common ground. Believe it or not, you will always have some common understanding of the issue. Start from first principles of why it makes sense for them and formulate when you start diverging.\nThe end goal is not to reach an agreement but to be aware of each other\u0026rsquo;s understanding.\n","permalink":"/arguments/","summary":"There are three types of arguments. You can argue about what happened (past), you can argue about what\u0026rsquo;s happening (present), or you can argue about what\u0026rsquo;s gonna happen (future).","title":"Arguments"},{"content":"Websites used to be developed by groups of people to meet the needs of other groups of people. Today, as the internet grows more personalised than an encyclopedia of information, I argue we need more personal websites. Social media platforms are limited and occupational in treating your content. Your message might be curtailed by what LinkedIn allows or 280 characters on Twitter.\nAcademics, especially grad students, need it even more as few CV pages do not include most details. When looking for work, it clarifies uncertainties about the candidate.\nContrary to many think, maintaining a personal website is neither difficult nor expensive. Unfortunately, creating a website is approached as a \u0026ldquo;technology problem\u0026rdquo; to be solved. Projects are coloured from the beginning by enthusiasms for or fear for HTML, CSS and other fancy jargon \u0026mdash; when it doesn\u0026rsquo;t have to be so.\nIn this talk, I will discuss why academics should have a personal website. I will also guide through designing a website and hosting it with a live hands-on example using Owlstown.\nPoster Example Sites Here are some example sites.\nProfessors Name Affiliation Website Dennis C. Rasmussen Syracuse University https://www.dennis-rasmussen.com/ Jonathan Ochshorn Cornell University https://jonochshorn.com/index.html Cynthia Rudin Duke University https://users.cs.duke.edu/~cynthia/ Laura Albert University of Wisconsin, Madison https://punkrockor.com/ Sean Willems University of Tennessee, Knoxville https://seanwillems.com/ : Examples of personal websites (professors)\nIndustry Name Affiliation Website Hadley Wickham RStudio https://hadley.nz/ Rami Krispin Apple https://ramikrispin.github.io/ Debarghya Das Glean http://debarghyadas.com/ Alison Hill IBM https://www.apreshill.com/ Brett Wendling Federal Trade Commission https://brettwendling.owlstown.net/ : Examples of personal websites (industry)\nGrad Students and Researchers Name Affiliation Website Sander van Bree University of Glasgow https://www.sandervanbree.com/ Neha Gupta Duke University https://nehargupta.github.io/ Jared Colston University of Wisconsin-Madison https://www.jaredcolston.com/ Sajjad Amrollahi Biyouki University of Tennessee, Knoxville https://sajjadbiyouki.github.io/ Slim Lim University of California, Berkeley https://slim.computer/ : Examples of personal websites (students and researchers)\nSee Owlstown\u0026rsquo;s directory for more examples. See this Twitter thread 🧵for even more examples.\n","permalink":"/personal-websites-for-academics/","summary":"Kick-off Workshop for University of Tennessee\u0026rsquo;s INFORMS Chapter","title":"Personal Websites for Academics"},{"content":"A few days ago I shared my personal Zoom room link to a few people: https://www.harsh17.in/zoom. The neatness of this in comparison to something like https://zoom.us/j/99672273048?pwd=eXV5R2pBR0FqNlBUWmtLdCt6THl3dz09 was amazing. It had two immediate effects:\nSharing Zoom link wasn\u0026rsquo;t a tedious task anymore. I could type it anywhere from my memory. People started taking my Zoom calls more seriously. I\u0026rsquo;d like to believe it was due to my content but the short URL certainly had an effect. How did I do it?\nBlogdown.\nR is amazing and can easily be extended to new functionalities. Blogdown is an R package for creating personal websites.\nYou can create your own set of URL redirects in Blogdown. Instead of typing long URLs, you can create your own custom short URLs of the form yourwebsite.com/shorturl.\nStep 1 Go to your folder /static/ located in your personal website folder. It may or may not have any content already. If you do not have that folder, create one right beside other folders such as content, layout and themes.\nPersonal website directory. Create a folder static here if you don\u0026rsquo;t have it already.\nStep 2 Open your text editor (Notepad in Windows and TextEdit in MacOS) and create a new file named _redirects with no extension. Your computer will tell you add an extension, ignore the warning and go ahead.\nStep 3 Write your URLs there. Start with the short URL and then long URL with at least two tab spaces. See my example below.\nRemember to leave appropriate space (at least two tabs) between the short and long URLs. Handle \u0026lsquo;/\u0026rsquo; properly.\nYou can have comments starting with #.\nThat\u0026rsquo;s all!\nCommit your edits to Github and give it two minutes to deploy via Netlify. Try your short URL and it should work.\nSome Tips Create short URLs for articles or website you frequently share or visit. For example, I have created short URLs for my social media profiles because twitter.com would take me to the home page instead of profile page.\nCreate a short URL for your Zoom room link. This one is a no-brainer.\nCreate short URLs for content you share often. I frequently share link to my Newsletter and my article on IPM admissions. So, I created their short URLs that I can type as I go.\nTroubleshooting Here are some tips in case you run into some troubles.\nMake sure you leave at least two tab spaces between short URL and long URL.\n/ matter. See that you are using the right short URL with or without / at the end.\nCommit to Github and wait for results. Netlify takes a few minutes to deploy your new site.\nIf you still have troubles, write to me at hello@harsh17.in and we can resolve it together.\n","permalink":"/creating-your-own-short-urls-with-blogdown/","summary":"Here\u0026rsquo;s how to make your short URLs using Hugo + Blogdown","title":"Creating your own short URLs with blogdown"},{"content":"Many years ago, Schrodinger figured out that imaginary numbers are the only way to make sense of reality. Professor F Dyson described it best in his recent lecture1:\n\u0026hellip;But then came the surprise. Schrodinger put the square root of minus one into the equation, and suddenly it made sense. Suddenly it became a wave equation instead of a heat conduction equation. \u0026hellip;And that square root of minus one means that nature works with complex numbers and not with real numbers.\nImaginary numbers are very real. Consider trigonometry. The angles and ratios are natural, right? You can write them using exponents of irrational numbers (something that we cannot count) raised by imaginary numbers (something that doesn\u0026rsquo;t precisely exist).\n\\(\\exp(ix)\\) can be decomposed with \\(\\sin (x)\\) and \\(\\cos(x)\\) using the following formulas.\n$$ \\exp(ix) = \\cos (x) + i \\sin(x), $$\n$$ \\exp(-ix) = \\cos(x) - i \\sin(x). $$\nNow, add up the two equations to get the value of \\(\\cos(x)\\) and \\(\\sin(x)\\).\n$$ \\cos(x) = \\frac{\\exp(ix) + \\exp(-ix)}{2}, $$\n$$ \\sin(x) = \\frac{\\exp(ix) - \\exp(-ix)}{2}. $$\nI used this trick to solve my high school trigonometry problems. In a broader sense, this also means that we can represent a natural angle and ratio in terms of quantities we can\u0026rsquo;t define as naturally in real life. Isn\u0026rsquo;t this beautiful?\nThe only way to make sense of reality is to borrow from the imaginary world. We will create non-sense tools that speak well with what we do know. Some imaginary tools that bind with reality as we know it. Some day we will be able to reconcile all of that knowledge together. There is a lot we do not know.\nFreeman, D. (2009). Birds and Frogs. Chinese Journal of Nature. PDF.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/imaginary-reality/","summary":"We know less than we think and we don\u0026rsquo;t even know what we don\u0026rsquo;t know.","title":"Imaginary Reality"},{"content":"Imagine a folder A whose content is to be copied to a folder B. A has five subfolders, each with 1, 2, 3, 4 and 5 files, respectively. For simplicity, consider the case that each file is of equal size. When copying the files from A to B, how would you measure the progress?\nTotal number of files is 1 + 2 + 3 + 4 + 5 = 15. One way to measure progress is in the intervals of 1/15. The progress bar would evenly progress in multiples of 0.06.\nAnother way is to consider the folder and its files to be divided as 1, 1/2 x 2, 1/3 x 3, 1/4 x 4, 1/5 x 5. In that case, when the first folder is done, the progress would be 20%. Then it\u0026rsquo;ll progress to 30%, 40%, 45%, 50%, 55%, 60%, and so on. The slowly decreasing rate of progress would be incredibly frustrating to watch. Most computer programs prefer the first version but the second alternative is possible.\nThis example might sound trivial and unreal, but it is incredible how often it turns up in real practical problems \u0026mdash; including cooking and supply chain statistics.\nMy roommate Tagg has a unique way of making ramen noodles. He would bring the water to boil, pour the ramen in, leave it for less than a minute and put many seasonings on top. My other roommate Jack pours most of the water out and then lets it sit with the seasoning to soak in the spices. Tommy and Jake boil it with the herb and dry every drop of water. They like raw ramen noodles.\nWhich one\u0026rsquo;s better? I can\u0026rsquo;t say definitely. (Although I like Tagg\u0026rsquo;s method, this NY Times recipe is the best.)\nDevils in The Details I am working on a research project with a hygiene products company based in North Carolina. It\u0026rsquo;s facing returns, sometimes up to 15% of its sales. Prof Sean and I were trying to find why. We found opportunities to streamline distribution using their data for sales, transportation, and claims.\nThis problem of choosing the \u0026ldquo;how\u0026rdquo; to calculate the metric turned up in something I thought was super simple. The company gave sales, transportation and products datasets. See the following examples. Of course, they\u0026rsquo;re not real, but they give you a good idea.\nThese datasets have random values and aren\u0026rsquo;t real. But they give you a taste of what the company provided us.\nDeciding on the metrics is way more complicated than I initially thought. Consider you want to estimate how many complete pallet orders were shipped from a location. Where do you start? Well, each item was in a carton which was in a pallet. So maybe, that\u0026rsquo;s a reasonable starting point.\nWe want to estimate the proportion of orders from a location in full pallets. There are at least two methods to find it.\nFirst, I find the number of full pallets for every row since each row (in the Sales sheet) is an order-item combination. Group all the entries by an order number; then, you can find what proportion of cases were sent in full pallets. But that is for every order, and we wanted to get metrics by location. So, you can aggregate the results again by (City, State) and calculate the average proportion of total pallet cases.\nOr, the other method is to group by (City, State) without first grouping by order number. This would disregard which items were part of which order \u0026mdash; breaking 1-to-1 matches. Some orders would be higher volume than others. There\u0026rsquo;s no reason they should be the same unless all items were the same.\nThis situation of defining the right metric turns up in so many different ways. How we aggregate things together matters because the end product depends not only on raw materials but the method as well. Simple things aren\u0026rsquo;t as intuitive as one might think. Ultimately, we have to use the metric that the company likes to use.\nA general note on metrics If people have to perform calculations on your metrics to generate insights, they\u0026rsquo;re not good metrics.\nSpecify what your metric represents and what it doesn\u0026rsquo;t represent. You\u0026rsquo;ll avoid situations where your metric is misused.\nAlways consider what the company or client thinks about your metrics and their businesses. If they disagree with your formulation, your metric will be just another number.\nThere is more variability than what a statistical model can capture. Listen to managers; they\u0026rsquo;ve more knowledge about their businesses than you\u0026rsquo;d ever have.\n","permalink":"/lets-say-you-re-copying-a-folder-to-a-folder/","summary":"What should the progress bar show? How to measure 10% work done?","title":"Let’s say you’re copying a folder to a folder"},{"content":"I interviewed Prof Emre Demirkaya from my department at the University of Tennessee. This was part of my seminar course on research by Prof Sean Willems (my advisor too 🚀). This essay is my reflection on our conversations.\nCareer Path and Research Questions Prof Emre did his undergraduate in mathematics. As he learnt more mathematics, his interest grew in applying those learnings to practical problems. After finishing his undergraduate, he joined Applied Mathematics at USC. After completing his comprehensive exam, he chose a field to try coding and simulations. He decided to work on machine learning and statistics with Prof Jinchi Lv at Marshall School of Business, USC.\nI also asked him why he chose to join a business school for research and not an engineering school as his research area is more theoretical than most researchers. He believed statisticians work on applied real-world problems; they\u0026rsquo;re not always in engineering schools. They are either biostatisticians working in art and sciences schools or analysts working at business schools.\n\u0026gt; Machine learning is statistics. I don\u0026rsquo;t get it when people disguise the beautiful mathematical equations and proofs with a coded blackbox.\nFor a large part, he believes, and I resonate with him, computer science research on machine learning and statistics-based research on machine learning are related. Though their favourite journals are different, they solve the same problem. They are different sides of the same die.\nFor a large part, the research questions that he worked on during his PhD came from his advisor. He was lucky to be involved in multiple areas of research. Even today, his work during PhD drives his primary research interests. The problems he is working on \u0026mdash; knockoff designs, feature selection and model selection \u0026mdash; are heavily researched but far from settled.\nHe was interested in coding. He started to code his research methods and simulation studies for research in R.\nHis Research: Knockoffs We also discussed his research, and he patiently explained the mathematical parts of his research to me. I found his paper1 lucidly written but containing dense mathematical notations, many of which I saw first. Knockoffs are methods to reduce the number of features. Finding a good feature that makes theoretical and practical sense is called discovery. The benefit of using knockoffs is that they are more immune to false discovery, i.e., finding a feature that looks important but isn\u0026rsquo;t essential.\nThe goal is to increase true discovery rates or reduce false discovery rates. Knockoffs create a replica variable for a variable in question. Then, the RANK algorithm tests the importance of the original variable by analysing its knockoff. Using that algorithm, we know how many false discoveries we are making. Thus, tweaking the knockoffs can bound our false discovery rate, which is remarkably useful.\nPublishing I asked him another more straightforward question, \u0026ldquo;how do you decide which journal to publish?\u0026rdquo;. \u0026ldquo;It is a difficult question\u0026rdquo;, he chuckled. Typically, he looks at the articles already published in a journal to gauge the kind of papers they accept. Once we start referring to journals, we will get a closer hint on which journals consider which papers as \u0026ldquo;interesting\u0026rdquo;.\nHis advice was to read the papers published recently by that journal and then decide if your article is like theirs. Old articles by that journal might not be suitable; new editors and managers change their research directions often. Some journals require that papers have a rigorous discussion on theoretical aspects while others focus on simulation results.\nThe methods to communicate research findings has changed significantly over the century. Earlier, it used to be centered around selected universities, some Royal Societies and occasional private institutions like RAND. Today, the dynamics have significantly changed. Growth of computing is an important catalyst too. Blogs and personal websites have supported individual control. Publications are searched through Google Scholar instead of a librarian. Package documentation websites are more read than foundational paper on the topic. I wondered if journal publications are going to lose their importance to quality online literature available for free.\nI asked him if he considers the possibility that research published on the open internet on blogs and Github would eat away the fiefdom of academic journals. He doesn\u0026rsquo;t think that is a possibility. The system does have some issues: peer-review isn\u0026rsquo;t a golden standard. But that doesn\u0026rsquo;t mean blogs and Github can rule. arXiv and SSRN may work better in some instances, significantly better than low-tier journals. However, the top journals, which are a handful and have very high impact, aren\u0026rsquo;t going away anytime soon. Blogs would help popularise research, but they cannot replace top journals anytime more quickly.\nDisseminating Research: Methods as Packages Since Prof Emre\u0026rsquo;s research focussed on developing methods, he built a large codebase including many novel functions from his study. I asked him if he considered publishing those functions as a package. That would enable others to use his method and make it more accessible and popular among practitioners.\nHe sometimes says researchers publish all their codebase online. Sometimes, those who want to apply the methods must develop them themselves or ask the authors for their codes. Applied researchers would have to modify existing theoretical approaches to suit their use cases.\nHe doesn\u0026rsquo;t publish his functions as a package because he doesn\u0026rsquo;t have enough time to prepare them. Writing packages for the methods he developed takes a back seat with the time crunch. I understand why this happens \u0026mdash; little incentive \u0026mdash; but dislike the result. Prof Emre agrees that more people would read, use, and cite the work if the package were made available. However, such works are usually done by large teams of researchers where some graduate students convert the codebase to a software package full time.\nAdvice for Younger self in Grad School Start your research early.\nTake more classes outside of my major.\nFan, Y., Demirkaya, E., Li, G., \u0026amp; Lv, J. (2019). Rank: large-scale inference with graphical nonlinear knockoffs. Journal of the American Statistical Association. arXiv.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/how-to-decide-what-to-research/","summary":"A Casual Interview of Prof Emre Demirkaya","title":"How to Decide What to Research?"},{"content":"I interviewed Prof Mike Galbreth from my department at the University of Tennessee. This was part of my seminar course on research by Prof Sean Willems (my advisor too 🚀). This essay is my reflection on our conversations.\nInteresting Problems I ponder how professors or field experts think about research questions. How do they identify if something is \u0026ldquo;interesting\u0026rdquo; to follow for research? Cachon (2016)1 presented a variety of methods to decide what is considered necessary in operations management. I started Prof Mike\u0026rsquo;s interview with the same question.\nThe first parameter he checks is the relevance of the topic for the industry. For example, he attended conferences and met several practitioners concerned about reverse logistics. People were returning more items brought over by e-commerce than ever. Companies found it costly as customers expected free returns. He decided to study the problem in detail, which became his research area.\nAnother method to look for insights from the industry is to know people. People working in retail routinely encounter non-trivial problems and look to academia for inspiration. By staying in touch with them, we can understand and find exciting research questions.\nApproach to Research Continuing to the next question, I asked his general research approach. What\u0026rsquo;s the next step after identifying an interesting problem? He explained his two-pronged procedure of finding research problems.\nFirst, he communicates with the industry professionals to exchange as many details as possible on the problem. Why do they care about the situation? What is the quantified harm or benefit to the company? What circumstances cause the pain? Is it industry-wide or specific to this company? How are they handling it right now? Most importantly, would they be willing to share data for him to try a solution?\nSecond, he looks at the academic literature for existing solutions. He always finds that some tangentially related solutions exist but cannot be applied without significant novel modifications. Studying the literature is also crucial for another critical activity: publishing. We cannot publish a paper on something that only we find interesting. The problem\u0026rsquo;s solution needs to justify why the issue is worth solving and how your solution is placed in the myriad of existing solutions.\nHe added, \u0026ldquo;People decide by page four if they want to reject a paper. You must get their attention with the abstract and introduction while maintaining sufficient rigour to keep them engaged.\u0026rdquo; That was enlightening for me. So far, I had considered \u0026ldquo;methods\u0026rdquo; the zest of paper and assumed to be the most engaging. However, I realized people might not even read that section if my abstract and introduction were not lucrative enough.\nMental Toolkits With so many different problems, I was curious about how he built his mental toolkit to deal with them. Every issue would require an experience with a new method: game theory, network optimization, auction theory, econometric theories and so on. How does he keep himself updated on all such forms?\nProf Mike explained that there are several aspects to it. He built a solid foundation during his doctoral studies by taking many economics and game theory courses \u0026mdash; which are helping him with his current research. However, the most common and practically necessary skill is self-teaching essential tools. One way to self-learn is to find a well-written paper in Management Science on this method. This paper would introduce him to the topic while also engaging with a practical case study of its application.\nSometimes, we find people with the specific skillset required to do those analyses. For example, if he feels limited on the mathematical rigour required for a problem, he collaborates with researchers who are good at such mathematical tools. His advice is to pick up many tool-related courses during my graduate school, as this is likely the only time I will have to learn.\nFinding Your Niche It is also essential to find an active research area. A booming research field with new tools and advances every day would always be more lucrative than otherwise. A mature area will have fewer novel topics and more incremental research. However, a field like healthcare or food waste will continue booming.\nYou might encounter significant resistance when you are a pioneer in a field. Like you might hear back from the editor, \u0026ldquo;Food waste? No one studies food waste.\u0026rdquo; However, once you convince the academics how is this topic exciting and impactful, you\u0026rsquo;d be in a great place. His advice is to find issues that haven\u0026rsquo;t been thoroughly researched, requiring only incremental improvements.\nFor example, supply chain contracts have been researched for years. We can make significant contributions, but those would still be incremental. In contrast, a field like diversity in operations \u0026mdash; gender diversity, ethnic diversity \u0026mdash; hasn\u0026rsquo;t been well researched, and its impacts are not well known to practitioners or even academics.\nAcademic Publishing Moving on, I asked him about his opinions on publishing: how editors determine the general direction of that journal and even the research field. The editors decide which papers get selected into the journal. It might be good to research the papers that an editor accepts and target our article accordingly. Sometimes these journals could look like a closed community \u0026mdash; editors taking work only from their friends \u0026mdash; but that is a rarity. Most editors are honest and look for excellent outcomes.\nThe methods to communicate research findings has changed significantly over the century. Earlier, it used to be centered around selected universities, some Royal Societies and occasional private institutions like RAND. Today, the dynamics have significantly changed. Growth of computing is an important catalyst too. Blogs and personal websites have supported individual control. Publications are searched through Google Scholar instead of a librarian. Package documentation websites are more read than foundational paper on the topic. I wondered if journal publications are going to lose their importance to quality online literature available for free.\nI asked him another question about online publishing. With the advent of the internet, more people publish their research works like software directly online through their website or Github. This bypasses the entire chain of checks: peer-review, limitation on acceptances, among others. There will be zero rejection as we can directly put our work online. Do you think this \u0026ldquo;open knowledge\u0026rdquo; will consume \u0026ldquo;locked\u0026rdquo; knowledge restricted to journals? In other words, does he consider this new wave of publishing a threat to academic journals?\nHe gave a thoughtful answer: it is hard to imagine academia without publication journals. These \u0026ldquo;open knowledge\u0026rdquo; projects gain sudden popularity but do not have a lasting impact. This medium does not have a mechanism to check for scientific rigour. Although a lower-ranked journal might lose its impact, it is unlikely that top-10 journals in the field would be less critical any time soon.\nHe reiterated what we learnt in our class: publishing in one of the top-4 journals should be the goal as a PhD student. That is what people would look for when hiring for research. Classes and conferences are essential, but the main goal is to put a paper in one of those top-4 journals. He explained how if someone didn\u0026rsquo;t have an article in one of the top journals, he would likely not receive an invitation for a job talk.\nTherefore, it is necessary to start early for research.\nHis Research: Product Returns We also discussed his research paper \u0026ldquo;How much do online consumers value free product returns? Evidence from eBay\u0026rdquo;, published in 2017 in the Journal of Operations Management.2 Interestingly, the research found that having a forward shipping charge (delivery fees in consumer-speak) impacts what people order. Selling something for $100 with a ten $10 delivery fee is inferior to the same product priced at $110 but no delivery fees.\nBased on the quoted product return fees, I learned how consumers behave irrationally \u0026mdash; but not as irrationally as expected. A product sold with a free return policy was considered more friendly than a product that considered all sales final. However, this incremental value was not enormous as hypothesized \u0026mdash; lingering around 5% for most products. This insight was critical for practitioners: they now know the impact of returns is not as high as expected. Methodically, they used regression analysis while controlling for variables such as product price, etc., to determine this conclusion.\nDisseminating Research to Industry Excited by the novelty of this conclusion, I asked if he knows of any company or an individual seller who used his findings to change their return policy. He said he presented his results to Home Depot and numerous conferences. He isn\u0026rsquo;t aware of any company currently exercising their policy with his research, though.\nThen, I asked him how he shares his results with the industry professionals \u0026mdash; it is unlikely that they would be reading the Journal of Operations Management. He shares that he does two critical things to disseminate research. First, he participates in many conferences where he meets practitioners and tells them about his research. Second, he writes columns in practice magazines in a column titled \u0026ldquo;View from Academia\u0026rdquo;, simplifying conclusions from his study.\nThird, which he would like to do more about, is collaborating with the PR department of the University of Tennessee, where they could hire professional writers to convert dense research papers into executive summaries to be published in alumni magazines and online. However, taking this last step is not very common. People should do that more often, but academics are not incentivized.\nAdvice for Younger self in Grad School Take as many tool classes as you can while not overloading yourself Start on research as early as possible. Don\u0026rsquo;t get to the third year and wake up one day that you need to do research. Find a professor with whom you enjoy working. You will be spending a lot of time together, and you need to be joyful working with him. Find a research problem that is active and not well established completely. Enjoy your time while you are still a student. This is probably the last time you will have fun without a direct job as a professor or researcher. Cachon, G. P. (2012). What is interesting in operations management?. Manufacturing \u0026amp; Service Operations Management, 14(2), 166-169. PDF.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nShang, G., Pekgün, P., Ferguson, M., \u0026amp; Galbreth, M. (2017). How much do online consumers really value free product returns? Evidence from eBay. Journal of Operations Management, 53, 45-62. SSRN.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/interesting-problems-and-where-to-find-them/","summary":"A Casual Interview of Prof Michael Galbreth","title":"Interesting Problems and Where to Find Them?"},{"content":" Without garlic I simply would not care to live code. \u0026mdash; Louis Diat1\nThese are some functions that I use very frequently in my projects. There are three categories of functions: exploratory functions to check missing values and describe data, visualisation functions for my ggplot2 themes and manipulative functions to modify selected variables.\nInstalling the Package If you don\u0026rsquo;t have devtools, install that first. devtools provides the function install_github() which can be used to install R packages hosted on Github.\ninstall.packages(\u0026#34;devtools\u0026#34;) #devtools::install_github(\u0026#34;harshvardhaniimi/garlic\u0026#34;) library(garlic) Exploratory Functions There are three exploratory functions. This vignette demonstrates how exploratory functions like show_in_excel(), which_na() and which_this() can be used.\nlibrary(garlic) Examples df = iris Show a data frame in MS Excel I found this function on Twitter but can\u0026rsquo;t find that tweet anymore. (Update: Bruno Rodrigues created this function. Here\u0026rsquo;s the tweet.)\nshow_in_excel(df) It can also be used with pipes.\nlibrary(dplyr) df %\u0026gt;% show_in_excel() Which values are missing? I\u0026rsquo;m initialising a vector from 1 to 10 with fifth value as missing NA.\nx = c(1:4, NA, 6:10) Using which_na(), I can find index of element in the vector which is NA.\nwhich_na(x) ## [1] 5 Which element is this? It can identify values that satisfy a criteria. It is kind of a wrapper around dplyr\u0026rsquo;s filter().\nwhich_this(iris, \u0026#34;Sepal.Length \u0026gt; 7\u0026#34;) ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 7.1 3.0 5.9 2.1 virginica ## 2 7.6 3.0 6.6 2.1 virginica ## 3 7.3 2.9 6.3 1.8 virginica ## 4 7.2 3.6 6.1 2.5 virginica ## 5 7.7 3.8 6.7 2.2 virginica ## 6 7.7 2.6 6.9 2.3 virginica ## 7 7.7 2.8 6.7 2.0 virginica ## 8 7.2 3.2 6.0 1.8 virginica ## 9 7.2 3.0 5.8 1.6 virginica ## 10 7.4 2.8 6.1 1.9 virginica ## 11 7.9 3.8 6.4 2.0 virginica ## 12 7.7 3.0 6.1 2.3 virginica Manipulative Functions There are two mutating functions that modify data frames in a certain way. na_rm_feature() is used for removing observations based on a single variable. na_to_zero() converts missing values to zero.\nlibrary(garlic) Examples Removing Rows Based on Missing Values in a Column Sometimes, I do not want to na.omit() because it will treat all features equally. I want to check values only for one column, while removing those observations.\n# First ten rows of iris dataset df = iris[1:10,] df ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5.0 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa ## 7 4.6 3.4 1.4 0.3 setosa ## 8 5.0 3.4 1.5 0.2 setosa ## 9 4.4 2.9 1.4 0.2 setosa ## 10 4.9 3.1 1.5 0.1 setosa # Setting second sepal width to NA df$Sepal.Width[2] = NA df ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 NA 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5.0 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa ## 7 4.6 3.4 1.4 0.3 setosa ## 8 5.0 3.4 1.5 0.2 setosa ## 9 4.4 2.9 1.4 0.2 setosa ## 10 4.9 3.1 1.5 0.1 setosa # Removing that observation df = na_rm_feature(df, \u0026#34;Sepal.Width\u0026#34;) df ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5.0 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa ## 7 4.6 3.4 1.4 0.3 setosa ## 8 5.0 3.4 1.5 0.2 setosa ## 9 4.4 2.9 1.4 0.2 setosa ## 10 4.9 3.1 1.5 0.1 setosa Changing Missing Values to Zero This function converts missing values to zero.\n# First ten rows of iris dataset df = iris[1:10,] df ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5.0 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa ## 7 4.6 3.4 1.4 0.3 setosa ## 8 5.0 3.4 1.5 0.2 setosa ## 9 4.4 2.9 1.4 0.2 setosa ## 10 4.9 3.1 1.5 0.1 setosa # Setting second sepal width to NA df$Sepal.Width[2] = NA df ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 NA 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5.0 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa ## 7 4.6 3.4 1.4 0.3 setosa ## 8 5.0 3.4 1.5 0.2 setosa ## 9 4.4 2.9 1.4 0.2 setosa ## 10 4.9 3.1 1.5 0.1 setosa na_to_zero(df$Sepal.Width) ## [1] 3.5 0.0 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ggserif() Theme I converted axes to directed arrows and made background grid more transparent. In academic publications, serif fonts are often preferred. Thus serif fonts are used.\nlibrary(garlic) library(ggplot2) library(dplyr) ## ## Attaching package: \u0026#39;dplyr\u0026#39; ## The following objects are masked from \u0026#39;package:stats\u0026#39;: ## ## filter, lag ## The following objects are masked from \u0026#39;package:base\u0026#39;: ## ## intersect, setdiff, setequal, union library(patchwork) This theme upgrades basic ggplot2 themes. It is particularly suitable for academic publications that require serif fonts for labels and arrowed axes.\nVisually Comparing with Default, Linedraw and Dark Themes Among the themes available in ggplot2, linedraw is my favourite.\np1 = iris %\u0026gt;% ggplot(aes(x = Sepal.Length, y = Sepal.Width, colour = Species)) + geom_point() + labs(title = \u0026#34;Default Theme\u0026#34;) p2 = iris %\u0026gt;% ggplot(aes(x = Sepal.Length, y = Sepal.Width, colour = Species)) + geom_point() + labs(title = \u0026#34;theme_minimal()\u0026#34;) + theme_linedraw() p3 = iris %\u0026gt;% ggplot(aes(x = Sepal.Length, y = Sepal.Width, colour = Species)) + geom_point() + labs(title = \u0026#34;theme_dark()\u0026#34;) + theme_dark() p4 = iris %\u0026gt;% ggplot(aes(x = Sepal.Length, y = Sepal.Width, colour = Species)) + geom_point() + labs(title = \u0026#34;ggserif()\u0026#34;) + ggserif() # Using patchwork, I can easily stitch these plots together. p1 / p2 / p3 / p4 Setting theme globally You can set theme globally for all plots using the following command.\ntheme_set(ggserif()) Citation Harshvardhan, M. (March 2022). garlic: Some R Functions I Use Rather Frequently. v0.1.0 (r-package). Github, Zenodo. https://doi.org/10.5281/zenodo.6331095\nLouis Diat was a French-American chef. I added the quote because it sounds cool. I called the package garlic simply because I love the taste of garlic.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/garlic-some-r-functions-i-use-rather-frequently/","summary":"My personal R package for custom functions","title":"garlic: Some R Functions I Use Rather Frequently"},{"content":" Titanic was a major tragedy. In this course project for BZAN 645: Machine Learning at University of Tennessee, I tried to predict if a particular individual would survive the tragedy. Why Titanic Dataset? Because it was course requirement. No offence to hundreds of soul that died, but the dataset is easy to get started with. Every course instructor loves it.\nI used Tidymodels to use xgboost and logistic regression with bootstrapped samples to predict the outcome.\nLoading Libraries and Dataset # Setting Parallel Processing to use six out of eight cores # Unix and macOS only library(doMC) ## Loading required package: foreach ## Loading required package: iterators ## Loading required package: parallel registerDoMC(cores = 6) # To reset, use # registerDoSEQ() library(tidyverse) ## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ── ## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4 ## ✓ tibble 3.1.6 ✓ dplyr 1.0.8.9000 ## ✓ tidyr 1.2.0 ✓ stringr 1.4.0 ## ✓ readr 2.1.2 ✓ forcats 0.5.1 ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ## x purrr::accumulate() masks foreach::accumulate() ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() ## x purrr::when() masks foreach::when() library(tidymodels) ## Registered S3 method overwritten by 'tune': ## method from ## required_pkgs.model_spec parsnip ## ── Attaching packages ────────────────────────────────────── tidymodels 0.1.4 ── ## ✓ broom 0.7.10 ✓ rsample 0.1.1 ## ✓ dials 0.0.10 ✓ tune 0.1.6 ## ✓ infer 1.0.0 ✓ workflows 0.2.4 ## ✓ modeldata 0.1.1 ✓ workflowsets 0.1.0 ## ✓ parsnip 0.1.7 ✓ yardstick 0.0.9 ## ✓ recipes 0.1.17 ## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ── ## x purrr::accumulate() masks foreach::accumulate() ## x scales::discard() masks purrr::discard() ## x dplyr::filter() masks stats::filter() ## x recipes::fixed() masks stringr::fixed() ## x dplyr::lag() masks stats::lag() ## x yardstick::spec() masks readr::spec() ## x recipes::step() masks stats::step() ## x purrr::when() masks foreach::when() ## • Search for functions across packages at https://www.tidymodels.org/find/ library(plotly) ## ## Attaching package: 'plotly' ## The following object is masked from 'package:ggplot2': ## ## last_plot ## The following object is masked from 'package:stats': ## ## filter ## The following object is masked from 'package:graphics': ## ## layout # Setting custom theme theme_h = function(base_size = 14) { theme_bw(base_size = base_size) %+replace% theme( # Specify plot title plot.title = element_text(size = rel(1), face = \u0026#34;bold\u0026#34;, family=\u0026#34;serif\u0026#34;, margin = margin(0,0,5,0), hjust = 0), # Specifying grid and border panel.grid.minor = element_blank(), panel.border = element_blank(), # Specidy axis details axis.title = element_text(size = rel(0.85), face = \u0026#34;bold\u0026#34;, family=\u0026#34;serif\u0026#34;), axis.text = element_text(size = rel(0.70), family=\u0026#34;serif\u0026#34;), axis.line = element_line(color = \u0026#34;black\u0026#34;, arrow = arrow(length = unit(0.3, \u0026#34;lines\u0026#34;), type = \u0026#34;closed\u0026#34;)), # Specify legend details legend.title = element_text(size = rel(0.85), face = \u0026#34;bold\u0026#34;, family=\u0026#34;serif\u0026#34;), legend.text = element_text(size = rel(0.70), face = \u0026#34;bold\u0026#34;, family=\u0026#34;serif\u0026#34;), legend.key = element_rect(fill = \u0026#34;transparent\u0026#34;, colour = NA), legend.key.size = unit(1.5, \u0026#34;lines\u0026#34;), legend.background = element_rect(fill = \u0026#34;transparent\u0026#34;, colour = NA), # Remove default background strip.background = element_rect(fill = \u0026#34;#17252D\u0026#34;, color = \u0026#34;#17252D\u0026#34;), strip.text = element_text(size = rel(0.85), face = \u0026#34;bold\u0026#34;, color = \u0026#34;white\u0026#34;, margin = margin(5,0,5,0), family=\u0026#34;serif\u0026#34;) ) } theme_set(theme_h()) You can download the datasets here.\nxfun::embed_file(path = \u0026#34;/Users/harshvardhan/Documents/UTK/Classes/Spring 2022/BZAN 645 Machine Learning/Homeworks/HW03/titanic-tidymodels/train.csv\u0026#34;) Download train.csv\nxfun::embed_file(path = \u0026#34;/Users/harshvardhan/Documents/UTK/Classes/Spring 2022/BZAN 645 Machine Learning/Homeworks/HW03/titanic-tidymodels/test.csv\u0026#34;) Download test.csv\ntraining = read_csv(\u0026#34;/Users/harshvardhan/Documents/UTK/Classes/Spring 2022/BZAN 645 Machine Learning/Homeworks/HW03/titanic-tidymodels/train.csv\u0026#34;) ## Rows: 891 Columns: 12 ## ── Column specification ──────────────────────────────────────────────────────── ## Delimiter: \u0026quot;,\u0026quot; ## chr (5): Name, Sex, Ticket, Cabin, Embarked ## dbl (7): PassengerId, Survived, Pclass, Age, SibSp, Parch, Fare ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. testing = read_csv(\u0026#34;/Users/harshvardhan/Documents/UTK/Classes/Spring 2022/BZAN 645 Machine Learning/Homeworks/HW03/titanic-tidymodels/test.csv\u0026#34;) ## Rows: 418 Columns: 11 ## ── Column specification ──────────────────────────────────────────────────────── ## Delimiter: \u0026quot;,\u0026quot; ## chr (5): Name, Sex, Ticket, Cabin, Embarked ## dbl (6): PassengerId, Pclass, Age, SibSp, Parch, Fare ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. training ## # A tibble: 891 × 12 ## PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin ## \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;chr\u0026gt; ## 1 1 0 3 Braun… male 22 1 0 A/5 2… 7.25 \u0026lt;NA\u0026gt; ## 2 2 1 1 Cumin… fema… 38 1 0 PC 17… 71.3 C85 ## 3 3 1 3 Heikk… fema… 26 0 0 STON/… 7.92 \u0026lt;NA\u0026gt; ## 4 4 1 1 Futre… fema… 35 1 0 113803 53.1 C123 ## 5 5 0 3 Allen… male 35 0 0 373450 8.05 \u0026lt;NA\u0026gt; ## 6 6 0 3 Moran… male NA 0 0 330877 8.46 \u0026lt;NA\u0026gt; ## 7 7 0 1 McCar… male 54 0 0 17463 51.9 E46 ## 8 8 0 3 Palss… male 2 3 1 349909 21.1 \u0026lt;NA\u0026gt; ## 9 9 1 3 Johns… fema… 27 0 2 347742 11.1 \u0026lt;NA\u0026gt; ## 10 10 1 2 Nasse… fema… 14 1 0 237736 30.1 \u0026lt;NA\u0026gt; ## # … with 881 more rows, and 1 more variable: Embarked \u0026lt;chr\u0026gt; testing ## # A tibble: 418 × 11 ## PassengerId Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked ## \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; ## 1 892 3 Kelly… male 34.5 0 0 330911 7.83 \u0026lt;NA\u0026gt; Q ## 2 893 3 Wilke… fema… 47 1 0 363272 7 \u0026lt;NA\u0026gt; S ## 3 894 2 Myles… male 62 0 0 240276 9.69 \u0026lt;NA\u0026gt; Q ## 4 895 3 Wirz,… male 27 0 0 315154 8.66 \u0026lt;NA\u0026gt; S ## 5 896 3 Hirvo… fema… 22 1 1 31012… 12.3 \u0026lt;NA\u0026gt; S ## 6 897 3 Svens… male 14 0 0 7538 9.22 \u0026lt;NA\u0026gt; S ## 7 898 3 Conno… fema… 30 0 0 330972 7.63 \u0026lt;NA\u0026gt; Q ## 8 899 2 Caldw… male 26 1 1 248738 29 \u0026lt;NA\u0026gt; S ## 9 900 3 Abrah… fema… 18 0 0 2657 7.23 \u0026lt;NA\u0026gt; C ## 10 901 3 Davie… male 21 2 0 A/4 4… 24.2 \u0026lt;NA\u0026gt; S ## # … with 408 more rows Here’s a brief overview of the variables:\nPassengerId identifies the variable. This is not useful for our model. Survived is a binary variable indicating if the passenger survived. Pclass tell us the class of passenger. We will have to perform one hot encoding for this variable. Name is the name of the passenger. This is not useful for our model. Sex is the sex of passenger. This will also need to be one-hot-encoded. Age is the age of passenger. Those in decimals are estimated ages. In our model, we will treat it like a continuous variable. SibSp is the number of siblings or spouses on the ship. (I wonder if Jack counted as Rose’s spouse but probably not as their relationship only began on the ship.) Parch is the number of parents or children abroad the ship. Ticket is a character variable with the ticket’s serial number. Fare is the amount paid for fare. Cabin numbers are largely missing. Embarked is the location the passengers boarded the ship. This will also need to be one-hot-encoded. Exploring Dataset Missing Values missing_df = function(df) { nf = ncol(df) miss_rate = numeric(nf) for (i in 1:nf) { miss_rate[i] = sum(is.na(df[,i]))/nrow(df) } ll = tibble(\u0026#34;names\u0026#34; = names(df), \u0026#34;miss_rate\u0026#34; = miss_rate) return(ll) } missing_df(training) ## # A tibble: 12 × 2 ## names miss_rate ## \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; ## 1 PassengerId 0 ## 2 Survived 0 ## 3 Pclass 0 ## 4 Name 0 ## 5 Sex 0 ## 6 Age 0.199 ## 7 SibSp 0 ## 8 Parch 0 ## 9 Ticket 0 ## 10 Fare 0 ## 11 Cabin 0.771 ## 12 Embarked 0.00224 Age has lots of missing variables and Cabin has many missing values. Embarked has a few missing values. I should probably drop Cabin from analysis, and impute values for the other two.\nSo, three alternatives:\nDelete the observations with missing values, Impute missing values with mean (for numeric) and mode (for categorical), Remove variables Age, Cabin and Embarked altogether. Number of Survivors First, we will test for class imbalance. Did more people survive than die, or vice versa?\ntraining %\u0026gt;% count(Survived) ## # A tibble: 2 × 2 ## Survived n ## \u0026lt;dbl\u0026gt; \u0026lt;int\u0026gt; ## 1 0 549 ## 2 1 342 The classes are almost balanced. 342 people survived; 549 didn’t.\nClass of Passengers (and Survivors) training %\u0026gt;% count(Pclass) ## # A tibble: 3 × 2 ## Pclass n ## \u0026lt;dbl\u0026gt; \u0026lt;int\u0026gt; ## 1 1 216 ## 2 2 184 ## 3 3 491 Most passengers were travelling in the third class.\ntraining %\u0026gt;% count(Pclass, Survived) %\u0026gt;% ggplot(aes(x = Pclass, y = n, fill = factor(Survived))) + geom_bar(position = \u0026#34;stack\u0026#34;, stat=\u0026#34;identity\u0026#34;) + labs(x = \u0026#34;Class of Passenger\u0026#34;, y = \u0026#34;Number of Passengers\u0026#34;, fill = \u0026#34;Survived?\u0026#34;) We can safely say that not many customers from class 3 survived.\nAge of Passengers (and Survivors) training %\u0026gt;% ggplot(aes(x = Age)) + geom_histogram(binwidth = 5) ## Warning: Removed 177 rows containing non-finite values (stat_bin). Most people were between the age of 20 and 40 years — so largely young people were travelling. At the same time, we also notice that the age is approximately normally distributed. Thus, we can impute the missing values with the mean.\nLet’s see age-wise distribution of survival.\np = training %\u0026gt;% ggplot(aes(x = Age, fill = factor(Survived))) + geom_histogram(binwidth = 5) + labs(x = \u0026#34;Age of Passenger\u0026#34;, y = \u0026#34;Number of Passengers\u0026#34;, fill = \u0026#34;Survived?\u0026#34;) ggplotly(p) ## Warning: Removed 177 rows containing non-finite values (stat_bin). So, younglings survived — those under 15 years of age. Probably this was because of the lifeguards’ instincts to save women and children first. (You can hover over the plot to know more.)\nNumber of Siblings/Spouses and Parents training %\u0026gt;% count(SibSp) %\u0026gt;% ggplot(aes(x = SibSp, y = n)) + geom_col() + labs(x = \u0026#34;Number of Siblings\u0026#34;, y = \u0026#34;Number of Passengers with `x` Siblings\u0026#34;) Most had no siblings. Some had one sibling or more siblings. Interestingly, no one had six or seven siblings.\ntraining %\u0026gt;% count(Parch) ## # A tibble: 7 × 2 ## Parch n ## \u0026lt;dbl\u0026gt; \u0026lt;int\u0026gt; ## 1 0 678 ## 2 1 118 ## 3 2 80 ## 4 3 5 ## 5 4 4 ## 6 5 5 ## 7 6 1 training %\u0026gt;% count(Parch) %\u0026gt;% ggplot(aes(x = Parch, y = n)) + geom_col() + labs(x = \u0026#34;Number of Parents / Children\u0026#34;, y = \u0026#34;Number of Passengers with `x` Parents / Children\u0026#34;) Most passengers were travelling alone. Some were travelling with 1 or 2 parents / children. Very few were travelling with their parents and children too.\nWhere did they Embark their Journey? training %\u0026gt;% count(Embarked) ## # A tibble: 4 × 2 ## Embarked n ## \u0026lt;chr\u0026gt; \u0026lt;int\u0026gt; ## 1 C 168 ## 2 Q 77 ## 3 S 644 ## 4 \u0026lt;NA\u0026gt; 2 Most embarked their journey from Southampton. Cherbourg was the second most popular boarding point. Queenstown was the least popular one. There are two missing values that we can fill the mode. Let’s see them by grouping through survival.\ntraining %\u0026gt;% count(Embarked, Survived) ## # A tibble: 7 × 3 ## Embarked Survived n ## \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;int\u0026gt; ## 1 C 0 75 ## 2 C 1 93 ## 3 Q 0 47 ## 4 Q 1 30 ## 5 S 0 427 ## 6 S 1 217 ## 7 \u0026lt;NA\u0026gt; 1 2 training %\u0026gt;% count(Embarked, Survived) %\u0026gt;% ggplot(aes(x = Embarked, y = n, fill = factor(Survived))) + geom_bar(position = \u0026#34;stack\u0026#34;, stat=\u0026#34;identity\u0026#34;) + labs(x = \u0026#34;Boarding Point\u0026#34;, y = \u0026#34;Number of Passengers\u0026#34;, fill = \u0026#34;Survived?\u0026#34;) There seems to be little relationship between where they started their journey on whether they survived or not. Proportionally, there aren’t many changes.\nNow, let’s start the cool part: machine learning.\nWrangling Data for Machine Learning The dataset that we have is only one: training. The test one doesn’t have labels and the only way to check is to upload to Kaggle. I will do that at the end. Right now, I need to segregate my data into two sets: training and testing. By default, R does 75/25 split.\nBut first, I will remove Cabin — which is mostly missing.\nset.seed(0) training$Survived = factor(training$Survived) training$Pclass = factor(training$Pclass) training = training %\u0026gt;% select(-PassengerId, -Cabin, -Name, -Ticket) split = initial_split(training, strata = Survived) tit_train = training(split) tit_test = testing(split) Let’s Fill The Missing Values How many values are missing right now in the training dataset?\nsum(is.na(tit_train)) ## [1] 137 There are a lot of missing values. Let’s see which variables are missing.\nmissing_df(tit_train) ## # A tibble: 8 × 2 ## names miss_rate ## \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; ## 1 Survived 0 ## 2 Pclass 0 ## 3 Sex 0 ## 4 Age 0.204 ## 5 SibSp 0 ## 6 Parch 0 ## 7 Fare 0 ## 8 Embarked 0.00150 Age and Embarked. As I had discussed earlier, I can fill them in with the mean and mode.\n# Function for mode (borrowed from https://stackoverflow.com/questions/2547402/how-to-find-the-statistical-mode) Mode = function(x) { ux = unique(x) ux[which.max(tabulate(match(x, ux)))] } mean_age = mean(tit_train$Age, na.rm = T) mode_embark = Mode(tit_train$Embarked) # Fill missing values with mean fill_with_mean = function(x) { x[is.na(x)] = mean(x, na.rm = T) return (x) } # Fill missing values with mode fill_with_mode = function(x) { x[is.na(x)] = Mode(x) return (x) } tit_train$Age = fill_with_mean(tit_train$Age) tit_train$Embarked = fill_with_mode(tit_train$Embarked) sum(is.na(tit_train)) ## [1] 0 So, all missing values are gone!\nNow, let’s do the same transformations to tit_test. Note that we will use training mean and mode for this purpose.\ntit_test$Age[is.na(tit_test$Age)] = mean_age tit_test$Embarked[is.na(tit_test$Embarked)] = mode_embark sum(is.na(tit_test)) ## [1] 0 Logistic Regression and xgboost Tree The first method I want to try is generalised least squares, or logistic regression. Second I want to xgboost tree.\nNote that our sample size is only 667. For most methods, this is very small. Thus, I will use bootstrapped samples.\ntit_boot = bootstraps(tit_train, strata = Survived) tit_boot ## # Bootstrap sampling using stratification ## # A tibble: 25 × 2 ## splits id ## \u0026lt;list\u0026gt; \u0026lt;chr\u0026gt; ## 1 \u0026lt;split [667/246]\u0026gt; Bootstrap01 ## 2 \u0026lt;split [667/244]\u0026gt; Bootstrap02 ## 3 \u0026lt;split [667/249]\u0026gt; Bootstrap03 ## 4 \u0026lt;split [667/242]\u0026gt; Bootstrap04 ## 5 \u0026lt;split [667/231]\u0026gt; Bootstrap05 ## 6 \u0026lt;split [667/239]\u0026gt; Bootstrap06 ## 7 \u0026lt;split [667/257]\u0026gt; Bootstrap07 ## 8 \u0026lt;split [667/230]\u0026gt; Bootstrap08 ## 9 \u0026lt;split [667/238]\u0026gt; Bootstrap09 ## 10 \u0026lt;split [667/249]\u0026gt; Bootstrap10 ## # … with 15 more rows So, this created 25 bootstrapped samples of different sizes.\nEngine Logistic Regression # Specifying Recipe for converting nominal to binary glm_rec = recipe(Survived ~ ., data = tit_train) %\u0026gt;% step_dummy(all_nominal_predictors()) # Logistic Regression glm_spec = logistic_reg() %\u0026gt;% set_engine(\u0026#34;glm\u0026#34;) xgboost Classification # Specify Recipe xg_rec = recipe(Survived ~ ., data = tit_train) %\u0026gt;% step_dummy(all_nominal_predictors()) # Specify Engine xg_model = boost_tree(mode = \u0026#34;classification\u0026#34;, # binary response trees = tune(), mtry = tune(), tree_depth = tune(), learn_rate = tune(), loss_reduction = tune(), min_n = tune()) # parameters to be tuned Workflow Logistic Regression glm_wf = workflow() %\u0026gt;% add_model(glm_spec) %\u0026gt;% add_recipe(glm_rec) xgboost Regression The following cross-validation needs to be performed to choose the appropriate xgboost model.\ncv_folds = vfold_cv(tit_train, v = 3, strata = Survived) c_metrics = metric_set(accuracy, sens, roc_auc) # Specify a baseline model control control = control_resamples(save_pred = TRUE, verbose = F) # Specify the workflow xg_wf = workflow() %\u0026gt;% add_model(xg_model) %\u0026gt;% add_recipe(xg_rec) Fitting the Models Logistic Regression Without bootstrapped samples:\nglm_rs_unboot = glm_wf %\u0026gt;% fit(data = tit_train) glm_rs_unboot ## ══ Workflow [trained] ══════════════════════════════════════════════════════════ ## Preprocessor: Recipe ## Model: logistic_reg() ## ## ── Preprocessor ──────────────────────────────────────────────────────────────── ## 1 Recipe Step ## ## • step_dummy() ## ## ── Model ─────────────────────────────────────────────────────────────────────── ## ## Call: stats::glm(formula = ..y ~ ., family = stats::binomial, data = data) ## ## Coefficients: ## (Intercept) Age SibSp Parch Fare Pclass_X2 ## 4.0983745 -0.0407578 -0.3150337 -0.0927634 0.0004301 -1.3108194 ## Pclass_X3 Sex_male Embarked_Q Embarked_S ## -2.3795645 -2.7701258 0.3385133 0.0321918 ## ## Degrees of Freedom: 666 Total (i.e. Null); 657 Residual ## Null Deviance: 888.3 ## Residual Deviance: 592.1 AIC: 612.1 Let’s check its accuracy and confusion matrix.\npredict(glm_rs_unboot, tit_train) %\u0026gt;% bind_cols(tit_train) %\u0026gt;% conf_mat(truth = Survived, estimate = .pred_class) ## Truth ## Prediction 0 1 ## 0 346 74 ## 1 65 182 predict(glm_rs_unboot, tit_train) %\u0026gt;% bind_cols(tit_train) %\u0026gt;% accuracy(truth = Survived, estimate = .pred_class) ## # A tibble: 1 × 3 ## .metric .estimator .estimate ## \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; ## 1 accuracy binary 0.792 So, accuracy is 80.1%. This is a good accuracy. Let’s see AUC.\npredict(glm_rs_unboot, tit_train, type = \u0026#34;prob\u0026#34;) %\u0026gt;% bind_cols(tit_train) %\u0026gt;% roc_auc(.pred_0,truth = Survived) ## # A tibble: 1 × 3 ## .metric .estimator .estimate ## \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; ## 1 roc_auc binary 0.856 0.843 as AUC score is not bad at all.\nWith bootstrapped samples:\nglm_rs = glm_wf %\u0026gt;% fit_resamples(resamples = tit_boot, control = control_resamples(save_pred = TRUE, verbose = F)) glm_rs ## # Resampling results ## # Bootstrap sampling using stratification ## # A tibble: 25 × 5 ## splits id .metrics .notes .predictions ## \u0026lt;list\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;list\u0026gt; \u0026lt;list\u0026gt; \u0026lt;list\u0026gt; ## 1 \u0026lt;split [667/246]\u0026gt; Bootstrap01 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble\u0026gt; ## 2 \u0026lt;split [667/244]\u0026gt; Bootstrap02 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble\u0026gt; ## 3 \u0026lt;split [667/249]\u0026gt; Bootstrap03 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble\u0026gt; ## 4 \u0026lt;split [667/242]\u0026gt; Bootstrap04 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble\u0026gt; ## 5 \u0026lt;split [667/231]\u0026gt; Bootstrap05 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble\u0026gt; ## 6 \u0026lt;split [667/239]\u0026gt; Bootstrap06 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble\u0026gt; ## 7 \u0026lt;split [667/257]\u0026gt; Bootstrap07 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble\u0026gt; ## 8 \u0026lt;split [667/230]\u0026gt; Bootstrap08 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble\u0026gt; ## 9 \u0026lt;split [667/238]\u0026gt; Bootstrap09 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble\u0026gt; ## 10 \u0026lt;split [667/249]\u0026gt; Bootstrap10 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble\u0026gt; ## # … with 15 more rows glm_rs %\u0026gt;% collect_metrics() ## # A tibble: 2 × 6 ## .metric .estimator mean n std_err .config ## \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;int\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;chr\u0026gt; ## 1 accuracy binary 0.791 25 0.00413 Preprocessor1_Model1 ## 2 roc_auc binary 0.844 25 0.00455 Preprocessor1_Model1 Logistic regression with bootstrapped samples gives me an accuracy of 80.6% with AUC of 0.846. So, bootstrapping has reduced overfitting by a small amount.\nLet’s see its ROC curve.\nglm_rs %\u0026gt;% collect_predictions() %\u0026gt;% group_by(id) %\u0026gt;% # -- to get 25 ROC curves, for each bootstrapped sample roc_curve(Survived, .pred_0) %\u0026gt;% autoplot() This is difficult to read so I will create a plot manually.\nglm_rs %\u0026gt;% collect_predictions() %\u0026gt;% group_by(id) %\u0026gt;% # -- to get 25 ROC curves, for each bootstrapped sample roc_curve(Survived, .pred_0) %\u0026gt;% ggplot(aes(1 - specificity, sensitivity, col = id)) + geom_abline(lty = 2, colour = \u0026#34;grey80\u0026#34;, size = 1.5) + geom_path(show.legend = FALSE, alpha = 0.5, size = 1.2) + coord_equal() The model looks pretty good, if you ask me. Let’s create the final fit with all of training data.\n# storing final glm model glm_fit = glm_wf %\u0026gt;% fit(data = tit_train) Check its metrics.\nglm_fit %\u0026gt;% predict(tit_train) %\u0026gt;% bind_cols(tit_train) %\u0026gt;% conf_mat(truth = Survived, estimate = .pred_class) ## Truth ## Prediction 0 1 ## 0 346 74 ## 1 65 182 glm_fit %\u0026gt;% predict(tit_train) %\u0026gt;% bind_cols(tit_train) %\u0026gt;% accuracy(truth = Survived, estimate = .pred_class) ## # A tibble: 1 × 3 ## .metric .estimator .estimate ## \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; ## 1 accuracy binary 0.792 glm_fit %\u0026gt;% predict(tit_train, type = \u0026#34;prob\u0026#34;) %\u0026gt;% bind_cols(tit_train) %\u0026gt;% roc_auc(truth = Survived, estimate = .pred_0) ## # A tibble: 1 × 3 ## .metric .estimator .estimate ## \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; ## 1 roc_auc binary 0.856 xgboost Model xg_tune = xg_wf %\u0026gt;% tune_grid(cv_folds, metrics = c_metrics, control = control, grid = crossing(trees = 1000, mtry = c(3, 5, 8), tree_depth = c(5, 10, 15), learn_rate = c(0.01, 0.005), loss_reduction = c(0.01, 0.1, 1), min_n = c(2, 10, 25))) I’m manually specifying grid values to try. Let’s visualise the models.\nautoplot(xg_tune) Having minimal node size as two gives a minor benefit in accuracy. Other than that, I am confused to know which model is the best. Thus, I will use show_best() to find the best model.\nshow_best(xg_tune, metric = \u0026#34;roc_auc\u0026#34;) ## # A tibble: 5 × 12 ## mtry trees min_n tree_depth learn_rate loss_reduction .metric .estimator ## \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; ## 1 3 1000 2 15 0.005 1 roc_auc binary ## 2 3 1000 2 5 0.005 1 roc_auc binary ## 3 3 1000 2 15 0.005 0.01 roc_auc binary ## 4 3 1000 2 5 0.005 0.01 roc_auc binary ## 5 3 1000 2 15 0.005 0.1 roc_auc binary ## # … with 4 more variables: mean \u0026lt;dbl\u0026gt;, n \u0026lt;int\u0026gt;, std_err \u0026lt;dbl\u0026gt;, .config \u0026lt;chr\u0026gt; show_best(xg_tune, metric = \u0026#34;accuracy\u0026#34;) ## # A tibble: 5 × 12 ## mtry trees min_n tree_depth learn_rate loss_reduction .metric .estimator ## \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; ## 1 8 1000 2 5 0.005 1 accuracy binary ## 2 8 1000 2 10 0.005 0.01 accuracy binary ## 3 8 1000 2 15 0.005 1 accuracy binary ## 4 8 1000 2 15 0.01 0.01 accuracy binary ## 5 8 1000 2 15 0.01 0.1 accuracy binary ## # … with 4 more variables: mean \u0026lt;dbl\u0026gt;, n \u0026lt;int\u0026gt;, std_err \u0026lt;dbl\u0026gt;, .config \u0026lt;chr\u0026gt; The best model has an accuracy of 0.822 and AUC of 0.861. The good news is that both give me the same model (i.e. have the same hyper-parameters). Thus, I am at a good place.\nbest_model = select_best(xg_tune, metric = \u0026#34;roc_auc\u0026#34;) Let’s finalise the model and train it on all of training set.\nxg_fit = xg_wf %\u0026gt;% finalize_workflow(best_model) %\u0026gt;% fit(data = tit_train) ## [15:15:10] WARNING: amalgamation/../src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior. xg_fit ## ══ Workflow [trained] ══════════════════════════════════════════════════════════ ## Preprocessor: Recipe ## Model: boost_tree() ## ## ── Preprocessor ──────────────────────────────────────────────────────────────── ## 1 Recipe Step ## ## • step_dummy() ## ## ── Model ─────────────────────────────────────────────────────────────────────── ## ##### xgb.Booster ## raw: 3 Mb ## call: ## xgboost::xgb.train(params = list(eta = 0.005, max_depth = 15, ## gamma = 1, colsample_bytree = 1, colsample_bynode = 0.333333333333333, ## min_child_weight = 2, subsample = 1, objective = \u0026quot;binary:logistic\u0026quot;), ## data = x$data, nrounds = 1000, watchlist = x$watchlist, verbose = 0, ## nthread = 1) ## params (as set within xgb.train): ## eta = \u0026quot;0.005\u0026quot;, max_depth = \u0026quot;15\u0026quot;, gamma = \u0026quot;1\u0026quot;, colsample_bytree = \u0026quot;1\u0026quot;, colsample_bynode = \u0026quot;0.333333333333333\u0026quot;, min_child_weight = \u0026quot;2\u0026quot;, subsample = \u0026quot;1\u0026quot;, objective = \u0026quot;binary:logistic\u0026quot;, nthread = \u0026quot;1\u0026quot;, validate_parameters = \u0026quot;TRUE\u0026quot; ## xgb.attributes: ## niter ## callbacks: ## cb.evaluation.log() ## # of features: 9 ## niter: 1000 ## nfeatures : 9 ## evaluation_log: ## iter training_logloss ## 1 0.690979 ## 2 0.689212 ## --- ## 999 0.304644 ## 1000 0.304552 Checking its confusion matrix, accuracy and other metrics.\npredict(xg_fit, tit_train) %\u0026gt;% bind_cols(tit_train) %\u0026gt;% conf_mat(truth = Survived, estimate = .pred_class) ## Truth ## Prediction 0 1 ## 0 389 50 ## 1 22 206 predict(xg_fit, tit_train) %\u0026gt;% bind_cols(tit_train) %\u0026gt;% accuracy(truth = Survived, estimate = .pred_class) ## # A tibble: 1 × 3 ## .metric .estimator .estimate ## \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; ## 1 accuracy binary 0.892 predict(xg_fit, tit_train, type = \u0026#34;prob\u0026#34;) %\u0026gt;% bind_cols(tit_train) %\u0026gt;% roc_auc(truth = Survived, estimate = .pred_0) ## # A tibble: 1 × 3 ## .metric .estimator .estimate ## \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; ## 1 roc_auc binary 0.946 So, this final model has 89.2% accuracy with 0.939 after being trained on all of data.\nLet’s look at the important predictors according to it.\nimportances = xgboost::xgb.importance(model = extract_fit_engine(xg_fit)) importances %\u0026gt;% mutate(Feature = fct_reorder(Feature, Gain)) %\u0026gt;% ggplot(aes(Gain, Feature)) + geom_col() Being a male had actually a significant effect in surviving. Fare — which likely represents which class people were from — also can tell us will they survive. Passengers in class 3 are the most important result. I had expected people from class 1 (upper class) are more likely to survive than those from lower class. However, the opposite is true. Where they embarked from has little effect.\nComparing Logistic Regression and xgboost (Training Sets) Model Accuracy ROC Area Under Curve Logistic Regression 0.792 0.856 Logistic Regression (with Bootstrapped Samples) 0.792 0.856 xgboost 0.894 0.951 Accuracy and AUC for Logistic Regression and xgboost\nBoosted regression trees perform better than logistic regression, around 10 per cent better. Let’s test both models on test dataset.\nTesting Model Let’s test the models on tit_test.\nLogistic Regression (with Bootstraps) glm_fit %\u0026gt;% predict(tit_test) %\u0026gt;% bind_cols(tit_test) %\u0026gt;% conf_mat(truth = Survived, estimate = .pred_class) ## Truth ## Prediction 0 1 ## 0 113 22 ## 1 25 64 glm_fit %\u0026gt;% predict(tit_test) %\u0026gt;% bind_cols(tit_test) %\u0026gt;% accuracy(truth = Survived, estimate = .pred_class) ## # A tibble: 1 × 3 ## .metric .estimator .estimate ## \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; ## 1 accuracy binary 0.790 glm_fit %\u0026gt;% predict(tit_test, type = \u0026#34;prob\u0026#34;) %\u0026gt;% bind_cols(tit_test) %\u0026gt;% roc_auc(truth = Survived, estimate = .pred_0) ## # A tibble: 1 × 3 ## .metric .estimator .estimate ## \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; ## 1 roc_auc binary 0.850 xgboost predict(xg_fit, tit_test) %\u0026gt;% bind_cols(tit_test) %\u0026gt;% conf_mat(truth = Survived, estimate = .pred_class) ## Truth ## Prediction 0 1 ## 0 120 19 ## 1 18 67 predict(xg_fit, tit_test) %\u0026gt;% bind_cols(tit_test) %\u0026gt;% accuracy(truth = Survived, estimate = .pred_class) ## # A tibble: 1 × 3 ## .metric .estimator .estimate ## \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; ## 1 accuracy binary 0.835 predict(xg_fit, tit_test, type = \u0026#34;prob\u0026#34;) %\u0026gt;% bind_cols(tit_test) %\u0026gt;% roc_auc(truth = Survived, estimate = .pred_0) ## # A tibble: 1 × 3 ## .metric .estimator .estimate ## \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; ## 1 roc_auc binary 0.881 Model Accuracy ROC Area under Curve Logistic Regression (Bootstrapped) 0.790 0.850 xgboost 0.830 0.884 Test accuracies for logistic regression and xgboost model.\nxgboost model performs a little better. Since Kaggle judges only by accuracy (and I’d argue the task of machine learning is to predict with little focus on inference), I would consider xgboost for the final model.\n","permalink":"/titanic-who-survived-the-tragegy/","summary":"Yet Another Machine Learning Project with Titanic Dataset","title":"Titanic: Who Survived The Tragedy?"},{"content":" Yesterday I was talking to one of my friends about his plans post PhD. “I want to go for pure sciences and abstract mathematics, but there are hardly any positions in academia on these topics.”, he said. It got me into thinking how many PhD students graduate every year and if the demand (in academia or in industry) is less than that. But I didn’t even know how many PhDs are awarded each year, let alone employed.\nWhile searching for a dataset for my Text Mining class project, I discovered this dataset on number of PhDs by field. So, let’s explore!\nlibrary(tidyverse) ## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ── ## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4 ## ✓ tibble 3.1.6 ✓ dplyr 1.0.8.9000 ## ✓ tidyr 1.2.0 ✓ stringr 1.4.0 ## ✓ readr 2.1.2 ✓ forcats 0.5.1 ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() library(garlic) library(DT) theme_set(theme_linedraw()) # Loading dataset from their repository phds = readr::read_csv(\u0026#34;https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-02-19/phd_by_field.csv\u0026#34;) ## Rows: 3370 Columns: 5 ## ── Column specification ──────────────────────────────────────────────────────── ## Delimiter: \u0026quot;,\u0026quot; ## chr (3): broad_field, major_field, field ## dbl (2): year, n_phds ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. phds ## # A tibble: 3,370 × 5 ## broad_field major_field field year n_phds ## \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; ## 1 Life sciences Agricultural sciences and natural resources Agric… 2008 111 ## 2 Life sciences Agricultural sciences and natural resources Agric… 2008 28 ## 3 Life sciences Agricultural sciences and natural resources Agric… 2008 3 ## 4 Life sciences Agricultural sciences and natural resources Agron… 2008 68 ## 5 Life sciences Agricultural sciences and natural resources Anima… 2008 41 ## 6 Life sciences Agricultural sciences and natural resources Anima… 2008 18 ## 7 Life sciences Agricultural sciences and natural resources Anima… 2008 77 ## 8 Life sciences Agricultural sciences and natural resources Envir… 2008 182 ## 9 Life sciences Agricultural sciences and natural resources Fishi… 2008 52 ## 10 Life sciences Agricultural sciences and natural resources Food … 2008 96 ## # … with 3,360 more rows There are many records by fields — in three levels of granularity.There are 337 fields and we have records for each of them between 2008 to 2017. Let’s see how many people are from which field.\nphds %\u0026gt;% group_by(broad_field) %\u0026gt;% summarise(n_phds = sum(n_phds, na.rm = T)) %\u0026gt;% arrange(desc(n_phds)) %\u0026gt;% datatable(colnames = c(\u0026#34;Broad Field\u0026#34;, \u0026#34;Number of PhDs\u0026#34;), rownames = FALSE, caption = \u0026#34;Number of PhDs by their broad fields. Life sciences lead the way.\u0026#34;) %\u0026gt;% formatRound(\u0026#34;n_phds\u0026#34;, digits = 0) Life sciences has most number of graduates. Engineering has least number of graduates — even less than mysterious Other. Surprisingly social sciences, humanities and eucation are higher than mathematics and computer science. And they lead by a margin. The number of graduates in “humanities and social science” subjects is four times the number of PhDs in in “hard sciences” like engineering and maths. No wonder there is such a shortage of people in the tech world.\nLife sciences as such a broad encompassing field. Let’s explore what is covered in life sciences.\nphds %\u0026gt;% filter(broad_field == \u0026#34;Life sciences\u0026#34;) %\u0026gt;% group_by(major_field) %\u0026gt;% summarise(n_phds = sum(n_phds, na.rm = T)) %\u0026gt;% arrange(desc(n_phds)) %\u0026gt;% datatable(colnames = c(\u0026#34;Major Field\u0026#34;, \u0026#34;Number of PhDs\u0026#34;), rownames = FALSE, caption = \u0026#34;Number of PhDs by their major fields. Biology, excluding health sciences, leads the way.\u0026#34;) %\u0026gt;% formatRound(\u0026#34;n_phds\u0026#34;, digits = 0) Biological and biomedical sciences has the most number of graduates. Let me explore engineering too. There are so few PhDs in geosciences. With climate change becoming another major issue, I wonder why the field isn’t picking up fast.\nLet’s see the fields in engineering.\nphds %\u0026gt;% filter(broad_field == \u0026#34;Engineering\u0026#34;) %\u0026gt;% group_by(major_field) %\u0026gt;% summarise(n_phds = sum(n_phds, na.rm = T)) %\u0026gt;% arrange(desc(n_phds)) ## # A tibble: 1 × 2 ## major_field n_phds ## \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; ## 1 Other engineering 18139 Oh, so no information. The information is nested in another column, I guess. I’ll have to group by field.\nphds %\u0026gt;% filter(broad_field == \u0026#34;Engineering\u0026#34;) %\u0026gt;% group_by(field) %\u0026gt;% summarise(n_phds = sum(n_phds, na.rm = T)) %\u0026gt;% arrange(desc(n_phds)) %\u0026gt;% datatable(colnames = c(\u0026#34;Field\u0026#34;, \u0026#34;Number of PhDs\u0026#34;)) %\u0026gt;% formatRound(\u0026#34;n_phds\u0026#34;, digits = 0) Computer engineering PhDs are most popular; twice as much as next in the list. Environmental engineering is the second most popular. That’s impressive. Let’s visualise the counts.\nphds %\u0026gt;% filter(broad_field == \u0026#34;Engineering\u0026#34;) %\u0026gt;% group_by(field) %\u0026gt;% summarise(n_phds = sum(n_phds, na.rm = T)) %\u0026gt;% ggplot(aes(reorder(field, n_phds), n_phds)) + geom_col() + coord_flip() + labs(y = \u0026#34;Number of PhDs\u0026#34;, x = \u0026#34;Field (Engineering only)\u0026#34;) The data gives me opportunity to see how it grew up with the rise in popoularity of computer engineering. I’ve heard numerous time that its popularity has increased over the years.\n# ggrepel for text labels library(ggrepel) phds %\u0026gt;% filter(broad_field == \u0026#34;Engineering\u0026#34;) %\u0026gt;% mutate(label = if_else(year == max(year), field, NA_character_)) %\u0026gt;% ggplot(aes(x = year, y = n_phds, colour = field)) + geom_line() + scale_x_continuous(breaks = seq(from = 2008, to = 2017, by = 1)) + geom_label_repel(aes(label = label), nudge_x = 1, na.rm = TRUE) + labs(x = \u0026#34;Year\u0026#34;, y = \u0026#34;Number of PhDs\u0026#34;) + theme(legend.position = \u0026#34;none\u0026#34;) ## Warning: Removed 20 row(s) containing missing values (geom_path). ## Warning: ggrepel: 10 unlabeled data points (too many overlaps). Consider ## increasing max.overlaps phds_top_engineering = phds %\u0026gt;% filter(broad_field == \u0026#34;Engineering\u0026#34;) %\u0026gt;% group_by(field) %\u0026gt;% summarise(n_phds = sum(n_phds)) %\u0026gt;% filter(n_phds \u0026gt; 100) %\u0026gt;% slice_max(order_by = n_phds, n = 6) phds_top_engineering ## # A tibble: 6 × 2 ## field n_phds ## \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; ## 1 Computer engineering 4030 ## 2 Environmental, environmental health engineeringl 2001 ## 3 Engineering, other 1488 ## 4 Nuclear engineering 1166 ## 5 Operations research (engineering) 985 ## 6 Systems engineering 924 phds %\u0026gt;% filter(field %in% phds_top_engineering$field) %\u0026gt;% ggplot(aes(x = year, y = n_phds, fill = field)) + geom_bar(stat = \u0026#34;identity\u0026#34;) + scale_x_continuous(labels = scales::label_number(accuracy = 1)) + scale_fill_manual(values = MetBrewer::met.brewer(\u0026#34;Hokusai1\u0026#34;, 6)) + facet_wrap( ~ field) + labs(x = \u0026#34;Year\u0026#34;, y = \u0026#34;Number of PhDs\u0026#34;, fill = \u0026#34;Field\u0026#34;) Computer engineering has been ever popular. I didn’t expect that.\nBut wait, wasn’t there a computer science in major_field? What was that? It was called Mathematics and computer sciences.\nphds %\u0026gt;% filter(broad_field == \u0026#34;Mathematics and computer sciences\u0026#34;) %\u0026gt;% group_by(major_field) %\u0026gt;% summarise(n_phds = sum(n_phds, na.rm = T)) %\u0026gt;% arrange(desc(n_phds)) %\u0026gt;% datatable(colnames = c(\u0026#34;Major Field\u0026#34;, \u0026#34;Number of PhDs\u0026#34;), rownames = FALSE, caption = \u0026#34;Mathematics and computer sciences has two fields.\u0026#34;) %\u0026gt;% formatRound(\u0026#34;n_phds\u0026#34;, digits = 0) phds %\u0026gt;% filter(broad_field == \u0026#34;Mathematics and computer sciences\u0026#34;) %\u0026gt;% filter(n_phds \u0026gt;= 300) %\u0026gt;% mutate(label = if_else(year == max(year), field, NA_character_)) %\u0026gt;% ggplot(aes(x = year, y = n_phds, colour = field)) + geom_line() + scale_x_continuous(breaks = seq(from = 2008, to = 2017, by = 1)) + geom_label_repel(aes(label = label), nudge_x = 1, na.rm = TRUE) + labs(x = \u0026#34;Year\u0026#34;, y = \u0026#34;Number of PhDs\u0026#34;) + theme(legend.position = \u0026#34;none\u0026#34;) Computer engineering averaged around 400; computer science averaged around 1500. I think this the “computer science” in general parlance.\nThis exploration is incomplete. I couldn’t finish it in time but I’d get back to it someday.\nToday I found this wonderful visualisation on Twitter that I thought to replicate for the number of PhDs by field.\nlibrary(tweetrmd) tweet_screenshot(\u0026#34;https://twitter.com/jenjentro/status/1512997114896269312?t=nWQqyQa3tHQVNSHPakh2TA\u0026#34;) Her codes were available on Github.\n# Loading packages library(tidytuesdayR) library(tidylog) ## ## Attaching package: 'tidylog' ## The following objects are masked from 'package:dplyr': ## ## add_count, add_tally, anti_join, count, distinct, distinct_all, ## distinct_at, distinct_if, filter, filter_all, filter_at, filter_if, ## full_join, group_by, group_by_all, group_by_at, group_by_if, ## inner_join, left_join, mutate, mutate_all, mutate_at, mutate_if, ## relocate, rename, rename_all, rename_at, rename_if, rename_with, ## right_join, sample_frac, sample_n, select, select_all, select_at, ## select_if, semi_join, slice, slice_head, slice_max, slice_min, ## slice_sample, slice_tail, summarise, summarise_all, summarise_at, ## summarise_if, summarize, summarize_all, summarize_at, summarize_if, ## tally, top_frac, top_n, transmute, transmute_all, transmute_at, ## transmute_if, ungroup ## The following objects are masked from 'package:tidyr': ## ## drop_na, fill, gather, pivot_longer, pivot_wider, replace_na, ## spread, uncount ## The following object is masked from 'package:stats': ## ## filter library(showtext) ## Loading required package: sysfonts ## Loading required package: showtextdb ","permalink":"/number-of-phds/","summary":"An Incomplete Data Exploration","title":"Number of PhDs by Field"},{"content":" If it takes less than 2 mins to do, do it now. The mental overhead to remember is more than the inconvenience to do it now.\nDo things, tell people.1\nBuild a model of everything. The model doesn\u0026rsquo;t have to be perfect but just better than a coin toss. Absorb the fact that you could be 100% wrong.2\nBe selective in what you believe in. There are good ideas in bad people\u0026rsquo;s brain; bad ideas in good people brain. As Gandhi said, hate the sin not the sinner.\nBecome the best in the world at what you do. Keep redefining what you do until this is true. That is: don\u0026rsquo;t be the best; be the only.\nWork as hard as you can. Even though who you work with and what you work on are more important than how hard you work.\nA deadline weeds out the extraneous and the ordinary. It prevents you from trying to make it perfect, so you have to make it different. Different is better.3\nPromptness is a sign of respect. Be impatient. 4\nNever leave an empty water bottle. Fill it up when it\u0026rsquo;s near-empty.\nContent is more important than the medium it\u0026rsquo;s processed or presented in.\nExperiments are usually easier than creating theories. Prove it to yourself.\nIf you can\u0026rsquo;t decide, the answer is no.\nDo things that your future self will thank you for.\nAsk dumb questions. Don\u0026rsquo;t be afraid to be wrong.\nDon\u0026rsquo;t use alarm everyday. Only use when you\u0026rsquo;ve to wake up at a specific time. Let your brain rest.\nYou might also be interested in Bullets of Wisdom.\nThis is a live blog and is expected to be updated frequently.\nhttp://carl.flax.ie/\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nDarwin writes in his autobiography that he found it necessary to write down every piece of evidence which appeared to contradict his beliefs because otherwise they would disappear from his mind. When you find apparent flaws you\u0026rsquo;ve got to be sensitive and keep track of those things, and keep an eye out for how they can be explained or how the theory can be changed to fit them.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nFrom Kevin Kelly\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nSo there you have it: patience is overrated. But perseverance? Now that has some value.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/rules/","summary":"Constants, not variables","title":"Rules for Life"},{"content":"Conventions through this page x is the name of the variable.\nSet Working Directory cd \u0026#34;/directory/\u0026#34; Use a Dataset use data.dta, clear Clear all items from memory clear all See documentation for details.\nDisplay all items list Display first element only (useful for a scalar) di x Setting number of observations qui set obs 30 qui says this command is executed quietly, i.e. with no output.\nSummary of a variable summarize x This will number of observations, mean, standard deviation, minimum and maximum.\nDescribe describe x This will tell us the type of variable it is.\nFunctions Functions in Stata are called Programs. See documentation.\nBelow is a sample program named onesample that generates 30 uniform random numbers and calculates their mean. We do need to specify rclass to specify the kind of return for the function. Other options is eclass.\ndrop _all is different from drop all ; the former drops all observations and the latter looks for a variable named all and deletes that.\nprogram onesample, rclass drop _all qui set obs 30 gen x = runiform() summ x return scalar mean = r(mean) end Monte Carlo Simulation If the function is defined as above, we can use the following codes to perform Monte Carlo simulation.\n* Simulate it 10000 times simulate xbar = r(mean), seed(0) reps(10000): onesample Visualisation Histogram hist x, width(0.1) title(\u0026#34;Histogram of x\u0026#34;) See documentation for details.\n","permalink":"/notes-on-stata/","summary":"Some nitty-gritties that economists remember but I don\u0026rsquo;t","title":"Notes on STATA"},{"content":"\nDriving is probably the riskiest adventure you do all through the day. In fact, Americans spend over 71 billion hours on the open roads in a year.1 The risk of accident while driving is much higher than most other adventures we undertake. Skydiving? One in 500,000.2 Mountaineering? One in a thousand.3 Driving? One in 400.4 Still, cars are the preference in the US. I guess habits are more potent than reason.\nEurope vs the United States Public transport in Europe is built differently. You\u0026rsquo;d see colossal parking lots for people flying out of the cities at the airports. The general trend is to get to the airport and then fly out. Contrast this with the US: people just drive through states with their luggage.\nThere are good reasons for their choices, of course. Gasoline is cheaper in the US than in most of the world, thanks to billions of dollars spent researching how to make shale oil useful.5 Road connectivity in US is probably the best I\u0026rsquo;ve seen so far.6 The interstate system connects almost all cities and almost every region is accessible.\nIn contrast, Europe provides the best locality for air travel. The area is full of mountains and difficult to drive terrain. Small-sized countries have at least one airport in their capital cities, and flights between countries are economical for obvious reasons.\nIndian Railways: The Lifeline of India Indian railway systems are economical for very different reasons. The peninsula allowed for dense railway tracks around the country. Coincidentally, high population density and high overall population ensure ridership volumes. The system is not profitable, but it doesn\u0026rsquo;t have to be.\nThe profitability of public utilities is a capitalist (and possibly American) concept. They wouldn\u0026rsquo;t be \u0026ldquo;public utilities\u0026rdquo; if they were profitable \u0026mdash; they\u0026rsquo;d be business ventures. Some activities are not worthwhile for business, but we do them because they are necessary. Wars are largely non-profitable, but we still have them. Then why not public transport?\nPitfalls of Car Dependency However, the biggest reason is that not everyone can afford private transport. Cars are expensive, even if you ignore gas prices. Every month, you would spend $200 on insurance, another $200 on a parking permit \u0026mdash; and that\u0026rsquo;s aside from the $500 you pay for your car loan. In contrast, I can travel from any place to any place in India for a little over $20. In many European countries and cities, they are free!\nDisadvantaged communities do not have equal access to necessities. I understand Right to Travel isn\u0026rsquo;t a fundamental right, but the existence of government goes beyond ensuring fundamental rights. Milton Friedman identified four essential government roles in his book Free to Choose. The fourth one was helping disadvantaged communities in the best possible way \u0026mdash; the government\u0026rsquo;s indirect support or direct activities.\nSome communities cannot travel from one place to another in the US only because they cannot afford a car. A single flight costs a fourth of their monthly income. Wouldn\u0026rsquo;t Friedman be sad about this reality?\nAnother limitation is innovation. Somehow we have stopped being creative about public transport. New technology in travel is mainly incremental than revolutionary. Bullet trains are almost sixty years old now. The boring company started an initiative, but it\u0026rsquo;s experimental and not scalable. (It is in LA only because Musk lives in LA; there are no plans to expand.) We strive for improving our train speeds or have Twitter spats over leg space. We have accepted that they are expensive and stopped innovating to reduce the price!\nThe Future of Public Transport An excellent public transport system also helps the world be greener. It is easier and cheaper to design electric buses and trains than electric cars. Elon Musk\u0026rsquo;s Tesla is a strong force on making cars electric, but it will be a long while. Musk\u0026rsquo;s original electric vehicles plan is to iterate from a sports car like Ferrari to a general-purpose car like Honda Civic. We are still around the sports car stage, and having all-electric vehicles would be a long shot.\nA thriving public transport system is essential to the development of a country. It normalises opportunities for everyone and supports the environment. Some day if I had to design a country\u0026rsquo;s public transport system, I would look at the road transport in the US, railways in India and flights in Europe for inspiration.\nAmerica\u0026rsquo;s deadliest road: New Port Riche, Florida Added on November 12, 2022.\nThere is something I would like to add to this post. Today, Dea shared a video on America’s deadliest road. US-19 in New Port Riche, Florida. A group of researchers found 60 pedestrian hotspots in the US. This 1 km stretch topped their list.\nIt has got a lot to do with how the cities are designed.\nAlong the road is a panoply of American consumerism: Walmart, Publix, tattoo parlors, chain hotels, motels, 7-Elevens, multiple Dunkin’s, medical equipment stores, condemned buildings, strip clubs, auto body repair shops, oil change places, custom paint job businesses, chain restaurants, deserted property waiting to be redeveloped, and a mini-golf course where you can feed baby alligators, fenced in near the sidewalk.\nWalk along this road, and you might begin to notice the danger. The speed limit is 45 to 55 miles per hour, but the cars are often going much faster. The crosswalks are so few and far between that a simple act — crossing the street to get to a business a few hundred feet away — might mean walking over half a mile to reach the nearest crosswalk. Even with sidewalks set back from the road, it’s clear that US-19 wasn’t built for pedestrians.\nWhy is it this bad? Experts would tell you that speed is the first cause. When drivers ride at 15 miles per hour (25 km per hour), they have broad peripheral vision. They only need 25 feet of distance to respond and stop. The peripheral vision narrows at 50 miles per hour (80 km per hour). They require 118 feet of distance to react and stop—three times the speed results in almost five times the length needed to stop. The pedestrian fatality risk increases to 85% from 2% at 15 miles per hour.\nAnother cause is the number of lanes. 97% of the cities that had pedestrian fatalities had multiple lanes. This stretch of road has eight lanes. Once you start crossing, you must ensure you get all eight lanes. Chicken road crossing game, anyone? The more lanes, the more the risk. The road is also straight as a ruler. There are no curves or bends to slow down the cars. The road is also wide. The cars can go at 80 km per hour without any problem. The posted speed limit is 45 miles per hour (72 km per hour).\nArterial roads are a problem too. It is a design where car-centric design mingles with residential neighbourhoods.\nThese roads make up only 13% of the US road network but are sites of 40% of all pedestrian fatalities.\nSince the distance between the two crosswalks is a km or around 3300 ft, people resort to jaywalking. (By the way, the more appropriate term is crossing at a location where there is no signal. Jaywalking is a term invented by automobile lobbyists to blame pedestrians for their deaths. Before the 1930s, crossing the road anywhere you wanted was socially and legally acceptable.)\nYou can learn more about it in this Vox video.\nWhat can be done? We have to start reducing the number of lanes. Highways may have eight lanes, but residential areas should not have over a lane. Additionally, install street parking and sidewalks, with several additional crosswalks.7\nThe government of Florida is spending millions on getting this fixed. The mayor is also envisioning to add pedestrian and cyclist bridges, avoiding the need to cross the road altogether.\nMap of New Port Riche, Florida According to American Driving Survey conducted by AAA Foundation for Traffic Safety. See Zeke Hartner, WTOP for more details.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nStatistics show that there is one tandem student fatality for every 500,000 tandem jumps. See Oklahoma Skydiving for more details.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nRauch et al. (2020) found an incidence of 2.5 accidents per 1000 mountaineers or 5.6 injuries per 10,000 hours of mountaineering. I have rounded the number for the purpose of my argument.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nAccording to Esurance, the chances of an average American into a car accident during a 1,000 mile long trip are 1 in 366. I have rounded the number in my argument.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nBloomberg has an excellent article on the topic: \u0026ldquo;After Blowing $300 Billion, U.S. Shale Finally Makes Money\u0026rdquo;.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nMy Chinese friends disagree but I can only confirm once I visit China myself.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThe city design can determine a lot about pedestrarian and bike safefy. Check out this TED talk where Jeff Speck explains how to make cities more walkable. There has to be a reason to walk, the walk has to feel and be safe, the walk has to be pleasant and comfortable, and the walk has to be interesting.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/public-transport-and-lack-thereof/","summary":"Why aren\u0026rsquo;t we designing anything new?","title":"Public Transport \u0026 Lack Thereof"},{"content":"Likelihood Likelihood is a measure of how probable a parameter is, given our data. It is calculated as the joint probability of the observed data as a function of parameters for a given statistical model. Since joint probabilities are usually products, it is much easier to deal with log-likelihood which is a sum of probabilities.\nNote that, likelihood is not the PDF of parameters. In maximum likelihood estimation, the likelihood function (i.e. the joint probability) is maximised to obtain a specific value of parameter. This value is “most likely” to be the true but unknown parameter. Generally, true parameter is denoted as \\(\\theta\\) and it’s estimate is denoted as \\(\\hat{\\theta}\\).\nScore Score is a derivative of log-likelihood function with respect to the parameter. It is usually evaluated at a particular value of the parameter.\n$$ s(\\theta) = \\frac{\\partial \\log \\mathcal{L}(\\theta)}{\\partial \\theta} $$\nThis differentiation will lead to a \\(1 \\times m\\) vector. This vector indicates the sensitivity of the likelihood. Under certain regularity conditions1, we can prove that the expected value of \\(s(\\theta)\\) is zero when evaluated at the true \\(\\theta\\).\n$$ E \\left[ \\frac{\\partial}{\\partial \\theta} \\log f(X; \\theta) \\bigg\\rvert \\theta \\right] \\ = \\int_\\mathbb{R} \\frac{\\frac{\\partial}{\\partial \\theta} f(x; \\theta)}{f(x;\\theta)} f(x;\\theta) dx \\ = \\frac{\\partial}{\\partial \\theta} \\int_{\\mathbb{R}} f(x;\\theta) dx \\ = \\frac{\\partial}{\\partial \\theta} 1 \\ = 0. $$\nFisher’s Information Fisher’s information is a way to measure how much information is contained in a known random variable \\(X\\) about the unknown population parameter \\(\\theta\\) that is supposed to model \\(X\\). It is calculated as the variance of the score.2\nImagine you want to estimate how good an estimate is given all our knowledge about it. Fisher’s information describes the probability \\(f(x)\\), given a known value of \\(\\theta\\). If \\(f(x)\\) is sharply peaked at a value of \\(X\\), it indicates we have a good estimate of the true \\(X\\). If \\(f(x)\\) is more evenly spread, we know little about the true value. Consequently we would need many more samples to accurately know what the value should be. Theoretically, we would need to know the entire population.\nThis intuitive understanding tells us that there would be some way to know how much information we have or how much information we need. Thus, there has to be a measure of variance with respect to \\(\\theta\\).\nTherefore, Fisher’s information is defined as the variance of the score, \\(s(\\theta)\\).\n$$ I(\\theta) = E \\left[ \\left(\\frac{\\partial}{\\partial \\theta} \\log f(X; \\theta \\right)^2 \\bigg\\rvert \\theta\\right]. $$\nFor a continuous PDF, this can be evaluated as the following.\n$$ I(\\theta) = \\int_\\mathbb{R} \\left( \\frac{\\partial}{\\partial \\theta} \\log f(x;\\theta) \\right)^2 f(x;\\theta) dx. $$\nSince this is variance and has a square term, \\(I(\\theta)\\) is always non-negative.\nSometimes, it is also denoted as the following.\n$$ I(\\theta) = -E \\left[ \\frac{\\partial^2}{\\partial \\theta^2} \\log f(X; \\theta) \\bigg\\rvert \\theta \\right], $$\nas\n$$ \\frac{\\partial^2}{\\partial \\theta^2} \\log f(X; \\theta) = \\frac{\\frac{\\partial^2}{\\partial \\theta^2} f(X; \\theta)}{f(X; \\theta)} - \\left( \\frac{\\frac{\\partial}{\\partial \\theta} f(X; \\theta)}{f(X; \\theta)} \\right)^2 \\ = \\frac{\\frac{\\partial^2}{\\partial \\theta^2} f(X; \\theta)}{f(X; \\theta)} - \\left( \\frac{\\partial}{\\partial \\theta} \\log f(X; \\theta) \\right)^2. $$\nAlso note that\n$$ E \\left[ \\frac{\\frac{\\partial^2}{\\partial \\theta^2} f(X; \\theta)}{f(X; \\theta)} \\bigg\\rvert \\theta \\right] = \\frac{\\partial^2}{\\partial \\theta^2} \\int_\\mathbb{R} f(x; \\theta) dx = 0, $$\nas proved earlier in Score. Putting this result in the last equation, we obtain the expected result.\nCramer-Rao Bound Cramer-Rao bound is a related concept. It states that the inverse of Fisher information is a lower bound on the variance of any unbiased estimator \\(\\theta\\). That is,\n$$ V (\\hat{\\theta}) \\geq \\frac{1}{I(\\theta)}. $$\nTo estimate the population parameter using maximum likelihood approach, we need to assume certain conditions about our probability density and likelihood functions. Continuity assumption is easily satisfied in real life but compactness isn’t as parameter space is often unbounded. Even when it is bounded, the bounds are usually unknown. For more details, see this.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nAll that follows is a correct but strong simplification of the complex concepts. For more mathematical notions, check Wikipedia and Ly et al. (2017).\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/score-and-fisher-s-information/","summary":"Intuitive Understanding and Mathematical Notions","title":"Score and Fisher's Information"},{"content":"In this tutorial-cum-note, I will demonstrate how to use Logistic Regression and Random Forest algorithms to predict sex of a penguin. The data penguins comes from palmerpenguins package in R. It was collected by Dr. Kristen Gorman on three species of penguins at the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network.\nThe goal is to build a classifier model that predicts sex of the penguins given its physical characteristics.\nThe dataset can be installed from CRAN.\n# If you don\u0026#39;t have palmerpenguins package, first install it. # install.packages(\u0026#34;palmerpenguins\u0026#34;) # Loading Libraries library(tidyverse) ## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ── ## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4 ## ✓ tibble 3.1.6 ✓ dplyr 1.0.7.9000 ## ✓ tidyr 1.1.4 ✓ stringr 1.4.0 ## ✓ readr 2.0.2 ✓ forcats 0.5.1 ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() library(palmerpenguins) # setting my personal theme theme_set(theme_h()) # Loading data data(\u0026#34;penguins\u0026#34;) penguins ## # A tibble: 344 × 8 ## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g ## \u0026lt;fct\u0026gt; \u0026lt;fct\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;int\u0026gt; \u0026lt;int\u0026gt; ## 1 Adelie Torgersen 39.1 18.7 181 3750 ## 2 Adelie Torgersen 39.5 17.4 186 3800 ## 3 Adelie Torgersen 40.3 18 195 3250 ## 4 Adelie Torgersen NA NA NA NA ## 5 Adelie Torgersen 36.7 19.3 193 3450 ## 6 Adelie Torgersen 39.3 20.6 190 3650 ## 7 Adelie Torgersen 38.9 17.8 181 3625 ## 8 Adelie Torgersen 39.2 19.6 195 4675 ## 9 Adelie Torgersen 34.1 18.1 193 3475 ## 10 Adelie Torgersen 42 20.2 190 4250 ## # … with 334 more rows, and 2 more variables: sex \u0026lt;fct\u0026gt;, year \u0026lt;int\u0026gt; We see that there are many missing instances. Let\u0026rsquo;s see how many of them are missing.\nsum(is.na(penguins)) ## [1] 19 So, nineteen entries are missing. Most likely, I will exclude them from the analysis at present but before that, I want to explore the data as it is.\nExploration One of the best methods to do it is via count() from dplyr.\npenguins %\u0026gt;% count(species) ## # A tibble: 3 × 2 ## species n ## \u0026lt;fct\u0026gt; \u0026lt;int\u0026gt; ## 1 Adelie 152 ## 2 Chinstrap 68 ## 3 Gentoo 124 penguins %\u0026gt;% count(island) ## # A tibble: 3 × 2 ## island n ## \u0026lt;fct\u0026gt; \u0026lt;int\u0026gt; ## 1 Biscoe 168 ## 2 Dream 124 ## 3 Torgersen 52 penguins %\u0026gt;% count(species, island) ## # A tibble: 5 × 3 ## species island n ## \u0026lt;fct\u0026gt; \u0026lt;fct\u0026gt; \u0026lt;int\u0026gt; ## 1 Adelie Biscoe 44 ## 2 Adelie Dream 56 ## 3 Adelie Torgersen 52 ## 4 Chinstrap Dream 68 ## 5 Gentoo Biscoe 124 penguins %\u0026gt;% count(sex) ## # A tibble: 3 × 2 ## sex n ## \u0026lt;fct\u0026gt; \u0026lt;int\u0026gt; ## 1 female 165 ## 2 male 168 ## 3 \u0026lt;NA\u0026gt; 11 penguins %\u0026gt;% count(year) ## # A tibble: 3 × 2 ## year n ## \u0026lt;int\u0026gt; \u0026lt;int\u0026gt; ## 1 2007 110 ## 2 2008 114 ## 3 2009 120 Cool. It looks pretty balanced.\npenguins %\u0026gt;% filter(!is.na(sex)) %\u0026gt;% ggplot(aes(flipper_length_mm, bill_length_mm, colour = sex, size = body_mass_g)) + geom_point(alpha = 0.6) + facet_wrap(~species) In general, there is significant difference between bill_length_mm for both the sexes. Also, bill length for Adelie is shorter than other two species \u0026mdash; for moth the sexes.\nThere are also packages like DataExplorer that can aid in the process. For this simplistic case, I want to demonstrate how to create classification model, so I will ignore the process.\nThe dataset has missing values as I noted earlier. So, I will remove the observations with missing values. I also do not need year and island for my classification model \u0026mdash; there is no logical reason why should they be affecting sex of a penguin.\npenguins_df = penguins %\u0026gt;% filter(!is.na(sex)) %\u0026gt;% select(-year, -island) Modelling Let\u0026rsquo;s start by loading tidymodels and setting the seed for randomisation.\nlibrary(tidymodels) ## Registered S3 method overwritten by \u0026#39;tune\u0026#39;: ## method from ## required_pkgs.model_spec parsnip ## ── Attaching packages ────────────────────────────────────── tidymodels 0.1.4 ── ## ✓ broom 0.7.10 ✓ rsample 0.1.1 ## ✓ dials 0.0.10 ✓ tune 0.1.6 ## ✓ infer 1.0.0 ✓ workflows 0.2.4 ## ✓ modeldata 0.1.1 ✓ workflowsets 0.1.0 ## ✓ parsnip 0.1.7 ✓ yardstick 0.0.9 ## ✓ recipes 0.1.17 ## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ── ## x scales::discard() masks purrr::discard() ## x dplyr::filter() masks stats::filter() ## x recipes::fixed() masks stringr::fixed() ## x dplyr::lag() masks stats::lag() ## x yardstick::spec() masks readr::spec() ## x recipes::step() masks stats::step() ## • Dig deeper into tidy modeling with R at https://www.tmwr.org set.seed(1) The first step of modelling is to create the training and testing split. Validation set will never be exposed to us during the modelling process. Once our final model is made, we can test it against the validation set.\nIn Tidy Models, this split is created using initial_split function. I also have the option to stratify the split \u0026mdash; which is necessary because we have multiple species and they are unbalanced. We only have 68 Chinstrap penguins but have 152 Adelie and 124 Gentoo. Let\u0026rsquo;s explore the help file for initial_split too.\n?initial_split # Specify the split penguins_split = initial_split(penguins_df, strata = sex) The proportion is set by default to 75% in training and 25% in testing. The functions training() and testing() will give me the resulting datasets.\npenguins_train = training(penguins_split) penguins_test = testing(penguins_split) penguins_train ## # A tibble: 249 × 6 ## species bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex ## \u0026lt;fct\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;int\u0026gt; \u0026lt;int\u0026gt; \u0026lt;fct\u0026gt; ## 1 Adelie 39.5 17.4 186 3800 female ## 2 Adelie 40.3 18 195 3250 female ## 3 Adelie 36.7 19.3 193 3450 female ## 4 Adelie 36.6 17.8 185 3700 female ## 5 Adelie 38.7 19 195 3450 female ## 6 Adelie 35.9 19.2 189 3800 female ## 7 Adelie 37.9 18.6 172 3150 female ## 8 Adelie 39.5 16.7 178 3250 female ## 9 Adelie 39.5 17.8 188 3300 female ## 10 Adelie 42.2 18.5 180 3550 female ## # … with 239 more rows penguins_test ## # A tibble: 84 × 6 ## species bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex ## \u0026lt;fct\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;int\u0026gt; \u0026lt;int\u0026gt; \u0026lt;fct\u0026gt; ## 1 Adelie 38.9 17.8 181 3625 female ## 2 Adelie 39.2 19.6 195 4675 male ## 3 Adelie 41.1 17.6 182 3200 female ## 4 Adelie 34.4 18.4 184 3325 female ## 5 Adelie 46 21.5 194 4200 male ## 6 Adelie 37.8 18.3 174 3400 female ## 7 Adelie 37.7 18.7 180 3600 male ## 8 Adelie 35.3 18.9 187 3800 female ## 9 Adelie 40.6 18.6 183 3550 male ## 10 Adelie 40.5 17.9 187 3200 female ## # … with 74 more rows The datasets looks good; our split worked well.\nBootstrapping Samples Note that our training sample has only 249 rows. This is not a huge dataset and we are not sure if the model we will make from this will be generalisable.\nOne simple way to solve this is using Bootstrapped samples. Each bootstrap sample is of original sample size but is also different from each other. They are collected sample of observations with replacement. Again, see the help file to understand the function.\n?bootstraps penguins_boot = bootstraps(penguins_train) penguins_boot ## # Bootstrap sampling ## # A tibble: 25 × 2 ## splits id ## \u0026lt;list\u0026gt; \u0026lt;chr\u0026gt; ## 1 \u0026lt;split [249/85]\u0026gt; Bootstrap01 ## 2 \u0026lt;split [249/93]\u0026gt; Bootstrap02 ## 3 \u0026lt;split [249/85]\u0026gt; Bootstrap03 ## 4 \u0026lt;split [249/93]\u0026gt; Bootstrap04 ## 5 \u0026lt;split [249/84]\u0026gt; Bootstrap05 ## 6 \u0026lt;split [249/87]\u0026gt; Bootstrap06 ## 7 \u0026lt;split [249/88]\u0026gt; Bootstrap07 ## 8 \u0026lt;split [249/92]\u0026gt; Bootstrap08 ## 9 \u0026lt;split [249/93]\u0026gt; Bootstrap09 ## 10 \u0026lt;split [249/86]\u0026gt; Bootstrap10 ## # … with 15 more rows So, we have 25 bootstrapped samples each with different resamples. Of course, if you have enough data or are confident enough about your data, you can skip this step.\nLogistic Regression Pipeline Logistic regression model is one of the simplest classification model. It is also the basic building block of neural networks; it dictates how a node behaves. Until 2010 when neural networks and support vector machines gained popularity, logistic regression was the model in force.\nEven today, the model is widely used in variety of real world applications. The biggest benefit of logistic regression models is its ability to explain and linear implementation.\nThe first step will be to set up model pipeline. This only sets up how the model would work and neither training or test has happened yet.\n# Simple Logistic Regression glm_spec = logistic_reg() %\u0026gt;% set_engine(\u0026#34;glm\u0026#34;) glm_spec ## Logistic Regression Model Specification (classification) ## ## Computational engine: glm There are other alternatives too. We can use Lasso regression (yes, Lasso can be used for classification as well. It \u0026ldquo;estimate[s] the parameters of the binomial GLM by optimising the binomial likelihood whilst imposing the lasso penalty on the parameter estimates\u0026rdquo;.) Or we can just use a regularised classification model.\n# regularised regression glm_spec = logistic_reg() %\u0026gt;% set_engine(\u0026#34;glmnet\u0026#34;) # LASSO regression glm_spec = logistic_reg(mixture = 1) %\u0026gt;% set_engine(\u0026#34;glmnet\u0026#34;) But for this simple tutorial, I will stick to simple logistic regression model.\n# Simple Logistic Regression glm_spec = logistic_reg() %\u0026gt;% set_engine(\u0026#34;glm\u0026#34;) Random Forest Pipeline Let\u0026rsquo;s set up a pipeline for random forest model as well. The good part about random forest model is that they do not require huge tuning efforts like neural networks.\nRandom forest models can be used for classification as well as regression. Furthermore, there are many implementations (packages) in R to choose from. randomForest is probably the most known one. ranger is a fast implementation of random forest models in R. I will use ranger for this model.\n# Engine could be spark rand_forest() %\u0026gt;% set_mode(\u0026#34;classification\u0026#34;) %\u0026gt;% set_engine(\u0026#34;spark\u0026#34;) ## Random Forest Model Specification (classification) ## ## Computational engine: spark # Or it could be randomForest rand_forest() %\u0026gt;% set_mode(\u0026#34;classification\u0026#34;) %\u0026gt;% set_engine(\u0026#34;randomForest\u0026#34;) ## Random Forest Model Specification (classification) ## ## Computational engine: randomForest # Or ranger rf_spec = rand_forest() %\u0026gt;% set_mode(\u0026#34;classification\u0026#34;) %\u0026gt;% set_engine(\u0026#34;ranger\u0026#34;) rf_spec ## Random Forest Model Specification (classification) ## ## Computational engine: ranger Workflow The next step in modelling pipeline is setting up the model with formula, model and data \u0026mdash; in that order. Because I have multiple models that I want to compare, I will only set up formula in my workflow.\npenguin_wf = workflow() %\u0026gt;% add_formula(sex ~ .) penguin_wf ## ══ Workflow ════════════════════════════════════════════════════════════════════ ## Preprocessor: Formula ## Model: None ## ## ── Preprocessor ──────────────────────────────────────────────────────────────── ## sex ~ . As it is seen, there is no model set yet.\nTraining Logistic Regression Let\u0026rsquo;s add logistic regression model. I can fit it directly to the training sample.\npenguin_wf %\u0026gt;% add_model(glm_spec) %\u0026gt;% fit(data = penguins_train) ## ══ Workflow [trained] ══════════════════════════════════════════════════════════ ## Preprocessor: Formula ## Model: logistic_reg() ## ## ── Preprocessor ──────────────────────────────────────────────────────────────── ## sex ~ . ## ## ── Model ─────────────────────────────────────────────────────────────────────── ## ## Call: stats::glm(formula = ..y ~ ., family = stats::binomial, data = data) ## ## Coefficients: ## (Intercept) speciesChinstrap speciesGentoo bill_length_mm ## -95.852333 -6.932255 -8.535185 0.633832 ## bill_depth_mm flipper_length_mm body_mass_g ## 2.014378 0.056401 0.006365 ## ## Degrees of Freedom: 248 Total (i.e. Null); 242 Residual ## Null Deviance:\t345.2 ## Residual Deviance: 85.49 AIC: 99.49 I get the coefficients and other detains for my model which is great.\nHowever, as I said before I can\u0026rsquo;t be absolutely sure of my model right away because of small sample. So, I will use bootstrapped samples that I created earlier. verbose = T will show me all steps.\nglm_rs = penguin_wf %\u0026gt;% add_model(glm_spec) %\u0026gt;% fit_resamples(resamples = penguins_boot, control = control_resamples(save_pred = TRUE, verbose = F)) ## ! Bootstrap12: preprocessor 1/1, model 1/1: glm.fit: fitted probabilities numerically 0... glm_rs ## Warning: This tuning result has notes. Example notes on model fitting include: ## preprocessor 1/1, model 1/1: glm.fit: fitted probabilities numerically 0 or 1 occurred ## # Resampling results ## # Bootstrap sampling ## # A tibble: 25 × 5 ## splits id .metrics .notes .predictions ## \u0026lt;list\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;list\u0026gt; \u0026lt;list\u0026gt; \u0026lt;list\u0026gt; ## 1 \u0026lt;split [249/85]\u0026gt; Bootstrap01 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble [85 ×… ## 2 \u0026lt;split [249/93]\u0026gt; Bootstrap02 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble [93 ×… ## 3 \u0026lt;split [249/85]\u0026gt; Bootstrap03 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble [85 ×… ## 4 \u0026lt;split [249/93]\u0026gt; Bootstrap04 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble [93 ×… ## 5 \u0026lt;split [249/84]\u0026gt; Bootstrap05 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble [84 ×… ## 6 \u0026lt;split [249/87]\u0026gt; Bootstrap06 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble [87 ×… ## 7 \u0026lt;split [249/88]\u0026gt; Bootstrap07 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble [88 ×… ## 8 \u0026lt;split [249/92]\u0026gt; Bootstrap08 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble [92 ×… ## 9 \u0026lt;split [249/93]\u0026gt; Bootstrap09 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble [93 ×… ## 10 \u0026lt;split [249/86]\u0026gt; Bootstrap10 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble [86 ×… ## # … with 15 more rows One bootstrapped sample had sampling issues (they training labels were unbalanced). To solve this, I could have specified strata = sex in bootstraps(). In this case it is acceptable because 24 of them worked well.\nTraining Random Forest The process is almost the same as that for logistic regression.\nrf_rs = penguin_wf %\u0026gt;% add_model(rf_spec) %\u0026gt;% fit_resamples(resamples = penguins_boot, control = control_resamples(save_pred = TRUE, verbose = F)) rf_rs ## # Resampling results ## # Bootstrap sampling ## # A tibble: 25 × 5 ## splits id .metrics .notes .predictions ## \u0026lt;list\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;list\u0026gt; \u0026lt;list\u0026gt; \u0026lt;list\u0026gt; ## 1 \u0026lt;split [249/85]\u0026gt; Bootstrap01 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble [85 ×… ## 2 \u0026lt;split [249/93]\u0026gt; Bootstrap02 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble [93 ×… ## 3 \u0026lt;split [249/85]\u0026gt; Bootstrap03 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble [85 ×… ## 4 \u0026lt;split [249/93]\u0026gt; Bootstrap04 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble [93 ×… ## 5 \u0026lt;split [249/84]\u0026gt; Bootstrap05 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble [84 ×… ## 6 \u0026lt;split [249/87]\u0026gt; Bootstrap06 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble [87 ×… ## 7 \u0026lt;split [249/88]\u0026gt; Bootstrap07 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble [88 ×… ## 8 \u0026lt;split [249/92]\u0026gt; Bootstrap08 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble [92 ×… ## 9 \u0026lt;split [249/93]\u0026gt; Bootstrap09 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble [93 ×… ## 10 \u0026lt;split [249/86]\u0026gt; Bootstrap10 \u0026lt;tibble [2 × 4]\u0026gt; \u0026lt;tibble [0 × 1]\u0026gt; \u0026lt;tibble [86 ×… ## # … with 15 more rows Notice that I did not get the same warning in random forest model. Why? Because random forest is not probabilistic in nature. Tree based models do not necessitate presence of balanced samples. They will simply give biased results and it is up to the researcher to investigate the flaws. That\u0026rsquo;s why they are little tricky.\nEvaluation How well do they compare against each other? The metrics to compare can be obtained using collect_metrics().\nLogistic Regression Metrics collect_metrics(glm_rs) ## # A tibble: 2 × 6 ## .metric .estimator mean n std_err .config ## \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;int\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;chr\u0026gt; ## 1 accuracy binary 0.905 25 0.00695 Preprocessor1_Model1 ## 2 roc_auc binary 0.971 25 0.00291 Preprocessor1_Model1 Random Forest Metrics collect_metrics(rf_rs) ## # A tibble: 2 × 6 ## .metric .estimator mean n std_err .config ## \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;int\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;chr\u0026gt; ## 1 accuracy binary 0.900 25 0.00613 Preprocessor1_Model1 ## 2 roc_auc binary 0.970 25 0.00272 Preprocessor1_Model1 Logistic regression performs slightly better in both metrics: accuracy and AUC. Even if they were nearly equal and I had to choose, I would choose linear model. It is faster to implement, scalable and most importantly explainable.\nPredictions The predictions can be found using collect_predictions() function.\nglm_rs %\u0026gt;% collect_predictions() ## # A tibble: 2,250 × 7 ## id .pred_female .pred_male .row .pred_class sex .config ## \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;int\u0026gt; \u0026lt;fct\u0026gt; \u0026lt;fct\u0026gt; \u0026lt;chr\u0026gt; ## 1 Bootstrap01 0.580 0.420 6 female female Preprocessor1_M… ## 2 Bootstrap01 0.987 0.0125 7 female female Preprocessor1_M… ## 3 Bootstrap01 0.978 0.0219 9 female female Preprocessor1_M… ## 4 Bootstrap01 0.0277 0.972 10 male female Preprocessor1_M… ## 5 Bootstrap01 0.842 0.158 11 female female Preprocessor1_M… ## 6 Bootstrap01 1.00 0.000350 12 female female Preprocessor1_M… ## 7 Bootstrap01 0.999 0.000525 13 female female Preprocessor1_M… ## 8 Bootstrap01 1.00 0.00000806 14 female female Preprocessor1_M… ## 9 Bootstrap01 0.918 0.0824 16 female female Preprocessor1_M… ## 10 Bootstrap01 0.966 0.0341 19 female female Preprocessor1_M… ## # … with 2,240 more rows The two important columns in this are .pred_female and .pred_male. This is the probability that they belong to a particular class. .pred_class gives the class that our model predicts a penguin to be in; sex is their true class.\nConfusion Matrix conf_mat_resampled() with no arguments gives confusion matrix in tidy format.\nglm_rs %\u0026gt;% conf_mat_resampled() ## # A tibble: 4 × 3 ## Prediction Truth Freq ## \u0026lt;fct\u0026gt; \u0026lt;fct\u0026gt; \u0026lt;dbl\u0026gt; ## 1 female female 39 ## 2 female male 4.04 ## 3 male female 4.44 ## 4 male male 42.5 Let\u0026rsquo;s see it in conventional format.\nglm_rs %\u0026gt;% collect_predictions() %\u0026gt;% conf_mat(sex, .pred_class) ## Truth ## Prediction female male ## female 975 101 ## male 111 1063 The model looks pretty good.\nROC Curve The ROC curve can be produced by roc_curve() function. autoplot() uses default settings.\nglm_rs %\u0026gt;% collect_predictions() %\u0026gt;% group_by(id) %\u0026gt;% # -- to get 25 ROC curves, for each bootstrapped sample roc_curve(sex, .pred_female) %\u0026gt;% autoplot() How does autoplot() work?\nglm_rs %\u0026gt;% collect_predictions() %\u0026gt;% group_by(id) %\u0026gt;% # -- to get 25 ROC curves, for each bootstrapped sample roc_curve(sex, .pred_female) ## # A tibble: 2,296 × 4 ## # Groups: id [25] ## id .threshold specificity sensitivity ## \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; ## 1 Bootstrap01 -Inf 0 1 ## 2 Bootstrap01 6.67e-10 0 1 ## 3 Bootstrap01 7.64e- 8 0.025 1 ## 4 Bootstrap01 8.21e- 8 0.05 1 ## 5 Bootstrap01 1.28e- 7 0.075 1 ## 6 Bootstrap01 1.58e- 7 0.1 1 ## 7 Bootstrap01 3.33e- 6 0.125 1 ## 8 Bootstrap01 1.42e- 5 0.15 1 ## 9 Bootstrap01 1.49e- 5 0.175 1 ## 10 Bootstrap01 2.19e- 5 0.2 1 ## # … with 2,286 more rows Let\u0026rsquo;s beautify our ROC curve.\nglm_rs %\u0026gt;% collect_predictions() %\u0026gt;% group_by(id) %\u0026gt;% # -- to get 25 ROC curves, for each bootstrapped sample roc_curve(sex, .pred_female) %\u0026gt;% ggplot(aes(1 - specificity, sensitivity, col = id)) + geom_abline(lty = 2, colour = \u0026#34;grey80\u0026#34;, size = 1.5) + geom_path(show.legend = FALSE, alpha = 0.6, size = 1.2) + coord_equal() I\u0026rsquo;m using geom_path instead of geom_line because I want to see discrete jumps. geom_line would give me a continious plot as it connects the points in the order of the variable on the x-axis. Another option is geom_step which only highlights changes \u0026mdash; when a variable steps to take another value.\n# Using geom_line() glm_rs %\u0026gt;% collect_predictions() %\u0026gt;% group_by(id) %\u0026gt;% # -- to get 25 ROC curves, for each bootstrapped sample roc_curve(sex, .pred_female) %\u0026gt;% ggplot(aes(1 - specificity, sensitivity, col = id)) + geom_abline(lty = 2, colour = \u0026#34;grey80\u0026#34;, size = 1.5) + geom_line(show.legend = FALSE, alpha = 0.6, size = 1.2) + coord_equal() # Using geom_step() glm_rs %\u0026gt;% collect_predictions() %\u0026gt;% group_by(id) %\u0026gt;% # -- to get 25 ROC curves, for each bootstrapped sample roc_curve(sex, .pred_female) %\u0026gt;% ggplot(aes(1 - specificity, sensitivity, col = id)) + geom_abline(lty = 2, colour = \u0026#34;grey80\u0026#34;, size = 1.5) + geom_step(show.legend = FALSE, alpha = 0.6, size = 1.2) + coord_equal() geom_abline() can take two arguments: intercept and slope. If you provide none of these, it plots \\(y = x\\). So, the best plot is from geom_path().\nglm_rs %\u0026gt;% collect_predictions() %\u0026gt;% group_by(id) %\u0026gt;% # -- to get 25 ROC curves, for each bootstrapped sample roc_curve(sex, .pred_female) %\u0026gt;% ggplot(aes(1 - specificity, sensitivity, col = id)) + geom_abline(lty = 2, colour = \u0026#34;grey80\u0026#34;, size = 1.5) + geom_path(show.legend = FALSE, alpha = 0.6, size = 1.2) + coord_equal() Okay, enough on ggplot2. From the above ROC curves, we can deduce the model is doing great for the training samples.\nTesting Samples The above metrics used only the test data. We need to check our model\u0026rsquo;s performance on test dataset too.\nLet\u0026rsquo;s fit the model using last_fit(). last_fit() fits the final best model to the training set and evaluates the test set.\npenguins_final = penguin_wf %\u0026gt;% add_model(glm_spec) %\u0026gt;% last_fit(penguins_split) penguins_final ## # Resampling results ## # Manual resampling ## # A tibble: 1 × 6 ## splits id .metrics .notes .predictions .workflow ## \u0026lt;list\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;list\u0026gt; \u0026lt;list\u0026gt; \u0026lt;list\u0026gt; \u0026lt;list\u0026gt; ## 1 \u0026lt;split [249/84]\u0026gt; train/test split \u0026lt;tibble [… \u0026lt;tibble … \u0026lt;tibble [84 … \u0026lt;workflo… Let\u0026rsquo;s check how good our final model is.\npenguins_final %\u0026gt;% collect_metrics() ## # A tibble: 2 × 4 ## .metric .estimator .estimate .config ## \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;chr\u0026gt; ## 1 accuracy binary 0.905 Preprocessor1_Model1 ## 2 roc_auc binary 0.966 Preprocessor1_Model1 penguins_final %\u0026gt;% collect_predictions() ## # A tibble: 84 × 7 ## id .pred_female .pred_male .row .pred_class sex .config ## \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;int\u0026gt; \u0026lt;fct\u0026gt; \u0026lt;fct\u0026gt; \u0026lt;chr\u0026gt; ## 1 train/test split 0.887 0.113 6 female female Preprocess… ## 2 train/test split 0.0000979 1.00 7 male male Preprocess… ## 3 train/test split 0.976 0.0238 8 female female Preprocess… ## 4 train/test split 0.996 0.00431 14 female female Preprocess… ## 5 train/test split 0.000000623 1.00 15 male male Preprocess… ## 6 train/test split 0.973 0.0273 16 female female Preprocess… ## 7 train/test split 0.772 0.228 17 female male Preprocess… ## 8 train/test split 0.662 0.338 21 female female Preprocess… ## 9 train/test split 0.434 0.566 22 male male Preprocess… ## 10 train/test split 0.961 0.0388 23 female female Preprocess… ## # … with 74 more rows Our model is 90.5% accurate and has AUC of 0.966. These are very high accuracies.\nConfusion Matrix for Test Set penguins_final %\u0026gt;% collect_predictions() %\u0026gt;% conf_mat(sex, .pred_class) ## Truth ## Prediction female male ## female 39 5 ## male 3 37 This does pretty good to be honest.\nFinal Workflow Let\u0026rsquo;s see the final model.\npenguins_final$.workflow[[1]] ## ══ Workflow [trained] ══════════════════════════════════════════════════════════ ## Preprocessor: Formula ## Model: logistic_reg() ## ## ── Preprocessor ──────────────────────────────────────────────────────────────── ## sex ~ . ## ## ── Model ─────────────────────────────────────────────────────────────────────── ## ## Call: stats::glm(formula = ..y ~ ., family = stats::binomial, data = data) ## ## Coefficients: ## (Intercept) speciesChinstrap speciesGentoo bill_length_mm ## -95.852333 -6.932255 -8.535185 0.633832 ## bill_depth_mm flipper_length_mm body_mass_g ## 2.014378 0.056401 0.006365 ## ## Degrees of Freedom: 248 Total (i.e. Null); 242 Residual ## Null Deviance:\t345.2 ## Residual Deviance: 85.49 AIC: 99.49 Can we tidy it up? (We need [[1]] to get the element out as .workflow is a list.\npenguins_final$.workflow[[1]] %\u0026gt;% tidy() ## # A tibble: 7 × 5 ## term estimate std.error statistic p.value ## \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; ## 1 (Intercept) -95.9 18.1 -5.30 0.000000115 ## 2 speciesChinstrap -6.93 1.91 -3.62 0.000295 ## 3 speciesGentoo -8.54 3.25 -2.63 0.00866 ## 4 bill_length_mm 0.634 0.159 4.00 0.0000639 ## 5 bill_depth_mm 2.01 0.455 4.43 0.00000940 ## 6 flipper_length_mm 0.0564 0.0653 0.863 0.388 ## 7 body_mass_g 0.00637 0.00138 4.63 0.00000374 The coefficients can be exponentiated to find the odds ratio.\npenguins_final$.workflow[[1]] %\u0026gt;% tidy(exponentiate = TRUE) %\u0026gt;% arrange(estimate) ## # A tibble: 7 × 5 ## term estimate std.error statistic p.value ## \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; ## 1 (Intercept) 2.35e-42 18.1 -5.30 0.000000115 ## 2 speciesGentoo 1.96e- 4 3.25 -2.63 0.00866 ## 3 speciesChinstrap 9.76e- 4 1.91 -3.62 0.000295 ## 4 body_mass_g 1.01e+ 0 0.00138 4.63 0.00000374 ## 5 flipper_length_mm 1.06e+ 0 0.0653 0.863 0.388 ## 6 bill_length_mm 1.88e+ 0 0.159 4.00 0.0000639 ## 7 bill_depth_mm 7.50e+ 0 0.455 4.43 0.00000940 The coefficient of 3.75 means that for every one mm increase in bill depth, the odds of being male increases by almost eight times. So, bill depth is very important.\nFlipper value are not very important. Remember that previously we explored the relationship between sex and flipper length. If flipper length is not important (high p-value), let\u0026rsquo;s see how the graph would look like with bill depth which is apparently very important.\npenguins %\u0026gt;% filter(!is.na(sex)) %\u0026gt;% ggplot(aes(bill_depth_mm, bill_length_mm, colour = sex, size = body_mass_g)) + geom_point(alpha = 0.6) + facet_wrap(~species) Conclusion In this creating this short tutorial, I learnt how to classify data using tidymodels workflow with logistic regression and random forest. An important lesson was that logistic regression can outperform complicated trees like random forest too.\nIf you are interested in data science and R, check out my free weekly newsletter Next.\nNext \u0026mdash; Today I Learnt About R A short and sweet curated collection of R-related works. Five stories. Four packages. Three jargons. Two tweets. One Meme.\nYou can read the past editions and subscribe to it here.\n","permalink":"/classification-logistic-regression-and-random-forest/","summary":"Using Tidymodels to Find Sex of Penguins","title":"Classification: Logistic Regression and Random Forest"},{"content":"When I first learnt linear regression six years ago, I was surprised by its power. I could know the effect of one phenomenon on another and the extent of the relationship. As years passed by, I revisited its different parts in pieces. Stability. Consistency. Precision. R Squared. The list goes on.\nSometime in the past, I decided to compile all I had learnt about linear regression. However, such massive projects never see fruition. In my MBA, Prof Amlesh Sharma from Texas A\u0026amp;M taught this in his Advanced Marketing Analytics class; I didn\u0026rsquo;t find this technique elsewhere. Using this simple trick, you can decide the optimal value for input to produce the maximum output.\nFormulation At its heart, linear regression is fitting a straight line in the existing data so that the line is closest to the points. We solve the optimisation problem to minimise the distance between the regression line and the actual observation. This approach is called \u0026ldquo;loss minimisation\u0026rdquo;. Another popular method is to \u0026ldquo;maximise likelihood\u0026rdquo;. Explanation of these methods is beyond the scope of this article.\nVariables Independent variables (X): Independent variables are assumed to be independent of each other and the cause of an effect.\nDependent variable (Y): Dependent variable is the final effect we try to estimate or predict.\nCase Study: Number of Golf Courses There is a strong theoretical reason to believe that the number of golf courses in a state would depend on climate, population, per capita income and popularity. Consider a simple case where you know only the people of the state and the number of golf courses in that state.\nThe linear model to find the number of gold courses can be written as the following.\n$$ Y = \\beta_0 + \\beta_1X \\varepsilon $$\n\\(Y\\) is the number of golf courses in the state and \\(X\\) is population.\nIs the relationship linear? Let\u0026rsquo;s check it.\nData Data on golf courses by US state is available at this website. I scrapped it into a .CSV file available here.\nlibrary(tidyverse) ## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ── ## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4 ## ✓ tibble 3.1.6 ✓ dplyr 1.0.7.9000 ## ✓ tidyr 1.1.4 ✓ stringr 1.4.0 ## ✓ readr 2.0.2 ✓ forcats 0.5.1 ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() theme_set(theme_h()) golf = read.csv(\u0026#34;https://www.harsh17.in/using-linear-regression-to-find-optimal-value/data/golf.csv\u0026#34;) %\u0026gt;% as_tibble() golf ## # A tibble: 52 × 4 ## State Location.quotient Establishments Employment ## \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; ## 1 U.S. Total 1 11,088 306,782 ## 2 Florida 2.66 884 48,670 ## 3 Hawaii 2.5 57 3,491 ## 4 South Carolina 2.1 256 9,094 ## 5 Arizona 1.58 168 9,417 ## 6 Nevada 1.39 83 3,959 ## 7 North Carolina 1.35 402 12,496 ## 8 Alabama 1.14 150 4,670 ## 9 Pennsylvania 1.13 551 13,975 ## 10 Georgia 1.1 279 10,279 ## # … with 42 more rows Population of each US state is available on this website. I downloaded the .CSV file available from there (click here to download).\npopulation = read.csv(\u0026#34;https://www.harsh17.in/using-linear-regression-to-find-optimal-value/data/state_population.csv\u0026#34;) %\u0026gt;% as_tibble() population ## # A tibble: 52 × 9 ## rank State Pop Growth Pop2021 Pop2010 growthSince2010 Percent density ## \u0026lt;int\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;int\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;int\u0026gt; \u0026lt;int\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; ## 1 1 Califor… 3.97e7 0.0013 3.96e7 3.73e7 0.0628 0.118 255. ## 2 2 Texas 3.01e7 0.0124 2.97e7 2.52e7 0.192 0.0896 115. ## 3 3 Florida 2.22e7 0.0106 2.19e7 1.88e7 0.177 0.066 414. ## 4 4 New York 1.92e7 -0.004 1.93e7 1.94e7 -0.0091 0.0572 408. ## 5 5 Pennsyl… 1.28e7 0.0001 1.28e7 1.27e7 0.0074 0.0381 286. ## 6 6 Illinois 1.25e7 -0.0041 1.26e7 1.28e7 -0.0251 0.0372 225. ## 7 7 Ohio 1.17e7 0.0011 1.17e7 1.15e7 0.0163 0.0349 287. ## 8 8 Georgia 1.09e7 0.0098 1.08e7 9.71e6 0.126 0.0325 190. ## 9 9 North C… 1.08e7 0.0099 1.07e7 9.57e6 0.129 0.0322 222. ## 10 10 Michigan 1.00e7 0.0003 9.99e6 9.88e6 0.0119 0.0297 177. ## # … with 42 more rows Note that there is a timeline mismatch. Golf course data is from 2017, and population data is from 2021. However, since the purpose of this article is to show linear regression-based optimisation and not an actual world application \u0026mdash; it\u0026rsquo;s acceptable.\nWrangling Data Let\u0026rsquo;s merge the two datasets and retain only the columns that we need.\n# inner join df = inner_join(golf, population) ## Joining, by = \u0026#34;State\u0026#34; # retain only number of golf courses and population in 2021 df = df %\u0026gt;% select(State, Establishments, Pop2021) # Establishments is in characters, so convert it to numeric df$Establishments = as.numeric(df$Establishments) Visualisation Just out of curiosity, let\u0026rsquo;s see how number of golf courses are with respect to state population.\ndf %\u0026gt;% ggplot(aes(x = Pop2021, y = Establishments, colour = State, size = Pop2021)) + geom_point(alpha = 0.5, show.legend = FALSE) So there is a straightforward pattern that as the population increases, the number of golf courses increases. However, the variance also increases with population. When the population is around 20 million, golf courses can be about 600 or 800. There will likely be heteroskedasticity problems because of dependent variance if we directly apply a linear model.\nOptimisation Imagine if I rewrite the linear model differently.\n$$ Y = a + bX + cX^2 $$\n\\(Y\\) is the number of golf courses and \\(X\\) is the population.\nThen, I can calculate \\(\\frac{dY}{dX}\\) as the following.\n$$ \\frac{dY}{dX} = b + 2cX $$\nFor the optimal value \u0026mdash; minimum or maximum \u0026mdash; we will set the first order condition to zero. Thus,\n$$ X = \\frac{-b}{2c}. $$\nWhether it will give us minima or maxima will depend on sign of \\(\\frac{d^2Y}{dX^2} = 2c\\), i.e. sign of \\(c\\). So, let\u0026rsquo;s first estimate the model and see what we get.\nfit = lm(Establishments ~ Pop2021 + I(Pop2021^2), data = df) summary(fit) ## ## Call: ## lm(formula = Establishments ~ Pop2021 + I(Pop2021^2), data = df) ## ## Residuals: ## Min 1Q Median 3Q Max ## -154.203 -48.364 2.627 41.706 244.067 ## ## Coefficients: ## Estimate Std. Error t value Pr(\u0026gt;|t|) ## (Intercept) -1.070e+01 2.219e+01 -0.482 0.632 ## Pop2021 4.648e-05 4.580e-06 10.149 1.55e-13 *** ## I(Pop2021^2) -7.671e-13 1.328e-13 -5.778 5.46e-07 *** ## --- ## Signif. codes: 0 \u0026#39;***\u0026#39; 0.001 \u0026#39;**\u0026#39; 0.01 \u0026#39;*\u0026#39; 0.05 \u0026#39;.\u0026#39; 0.1 \u0026#39; \u0026#39; 1 ## ## Residual standard error: 87.98 on 48 degrees of freedom ## Multiple R-squared: 0.8092,\tAdjusted R-squared: 0.8013 ## F-statistic: 101.8 on 2 and 48 DF, p-value: \u0026lt; 2.2e-16 We see that \\(c = -7.6 \\times 10^{-13}\\), which is negative. Thus, the value we get from \\(X = \\frac{-b}{2c}\\) will be maximiser. At that population, we would have the maximum number of golf courses.\nLet\u0026rsquo;s calculate that critical value.\ncc = coef(fit) print(unname(-cc[2]/(2*cc[3]))) ## [1] 30296969 X = -cc[2]/(2*cc[3]) Thus, the population that maximises the number of golf courses is 30,296,969 \u0026mdash; around thirty million.\nHow many golf courses we will have in that case?\n(Y = cc %*% c(1, X, X^2)) ## [,1] ## [1,] 693.4493 It will have around 694 golf courses. This looks like a correct answer from the plot as well.\ndf %\u0026gt;% ggplot(aes(x = Pop2021, y = Establishments)) + geom_point(alpha = 0.5) + geom_point(aes(X, Y), colour = \u0026#34;red\u0026#34;, size = 2) ","permalink":"/using-linear-regression-to-find-optimal-value/","summary":"Using linear regression to find the optimal value of an input","title":"Linear Regression and Optimisation"},{"content":"\nOnce I learned enough statistics, I realised that being true for most of the population doesn\u0026rsquo;t mean it\u0026rsquo;s true for all folks. Most algorithms are approximately wrong. Considering both of these statements, all of what we know could be false. Improbable but not impossible.\nScientists considered everything that could\u0026rsquo;ve impacted our survival today to calculate our probability of existence. They considered the solar storms that the earth faced, the asteroids that killed dinosaurs, the wars we all survived. The chance of us being alive today is one in four hundred trillion.1 That is, 1:400,000,000,000,000. With fourteen zeros. A perfect example of improbable but not impossible.\nNaseem Nicholas Taleb calls these black swan events \u0026mdash; events that are nearly impossible to predict. Everyone thought there were no black swans until someone caught them sunbathing in Australia. Algorithms are even worse in catching them. The over-reliance on prediction accuracy steals attention from their likelihood of happening and our confidence in them. In his thesis work with Marvin Minsky, Patrick Winston concluded that the difficulty of machine learning is that it\u0026rsquo;s only possible to learn something it nearly already knows.2\nThen there are issues of reproducibility as well. Everyone is different from one another, and that\u0026rsquo;s a universal fact. However, once in groups, this is much easier to model. I can\u0026rsquo;t say if Sarah would eat from McDonald\u0026rsquo;s today, but I know at least sixty million people will.3 Statistics and central limit theorem are great friends.\nWhat about individual behaviour? That\u0026rsquo;s too wild to predict. Or is it? Internet companies are doing it so well. Facebook and Google have personalised services just for me. But that\u0026rsquo;s still based on group patterns. The system looks for users like me and tries to make me like them.\nThe real trouble is when we forget how inaccurate they are; when we fail to acknowledge their intelligence of a carrot and over rely on them. Situations like these result in false positives and fatal causalities. We need to provoke future statisticians and engineers on these humanely biased instincts. What do they think about it and why? The biggest lesson of education is not how to think but what to think about.\nThe good thing is there is a solution. Just be a little more aware. How? Paul Graham has an idea.4\nYou can also take more explicit measures to prevent yourself from automatically adopting conventional opinions. The most general is to cultivate an attitude of skepticism. When you hear someone say something, stop and ask yourself \u0026ldquo;Is that true?\u0026rdquo; Don\u0026rsquo;t say it out loud. I\u0026rsquo;m not suggesting that you impose on everyone who talks to you the burden of proving what they say, but rather that you take upon yourself the burden of evaluating what they say.\nHe further adds:\nTreat it as a puzzle. You know that some accepted ideas will later turn out to be wrong. See if you can guess which. The end goal is not to find flaws in the things you\u0026rsquo;re told, but to find the new ideas that had been concealed by the broken ones. So this game should be an exciting quest for novelty, not a boring protocol for intellectual hygiene. And you\u0026rsquo;ll be surprised, when you start asking \u0026ldquo;Is this true?\u0026rdquo;, how often the answer is not an immediate yes. If you have any imagination, you\u0026rsquo;re more likely to have too many leads to follow than too few.\nThe general goal is to understand the limits of what we know and how confident we are about it; if we are not, the maturity to consider being wrong as a possibility. Building up from that maturity is much easier.\nSomeday, when I teach the future stalwarts, a thought-provoking ethical question would be part of the exam. Everyone would get full credits but will have to answer thoughtfully.\nThis number comes from one of my favourite TED talks: How to stop screwing yourself over by Mel Robbins at TEDx San Francisco.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nLearning Structural Descriptions from Examples. DSpace@MIT.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nPhotographer Nolan Conway travelled across 22 states in two months to see just exactly who eats at McDonald\u0026rsquo;s. His photo essay is awesome. Data source is also the same.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nHow to Think for Yourself by Paul Graham. November 2020.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/improbable-doesn-t-mean-impossible/","summary":"Please don\u0026rsquo;t treat them alike","title":"Improbable Doesn't Mean Impossible"},{"content":"Installing Packages The process to install package depends on the Python environment you are using.\nThere are two possibilities: pip, conda and within Jupyter Hub. If you are using Anaconda (Navigator), use conda. pip will install for all environments; conda will install for (activated) conda environment only. See 7 for details on Python environments.\nGenerally, always use conda for handling environments and pip to install packages.\npip install \u0026lt;package\u0026gt; For a specific version, you can use pip install pandas==2.0.1.\nHere are some common pip commands that\u0026rsquo;ll come in handy. You can get the full list with pip help.\npip freeze or pip list will give a list of all packages that are installed along with their version numbers pip check will list all packages that have conflicted or broken requirements pip show numpy will show all details of that package, including package version and location where it is installed. (numpy is an example, replace accordingly). To uninstall a package, use pip uninstall \u0026lt;package\u0026gt;.\nTerminal in Jupyter Terminal commands can be passed within the Jupyter notebook. Any command starting with ! would be run like a terminal command within the Jupyter notebook.\n# list all files in current directory !ls # delete all files in the current folder !rm -r * Debugging in the Current State Many a time our code crashes and we would like to investigate the environment at that moment. Running %debug in Jupyter Notebook immediately after the code crashes would open a debugging environment where you can see variables\u0026rsquo; values, etc.\n# some code that crashed x = some_function(y) # in the next Jupyter cell, run this command %debug Printing There are two methods: print() and pprint(). They both serve the save cause but pprint() is better at displaying complex data structures such as list of lists or JSON files.\nIt is worthwhile to mention f-strings. They are usually more compact than writing full print statements. Furthermore, if you want to print value of a variable, they\u0026rsquo;re really short.\n# enclose the variable in curly braces print(f\u0026#34;Value of x is {x}\u0026#34;) # if you want to just print the value of, use {x=} print(f\u0026#34;{x=}\u0026#34;) Adding Rows It is more efficient to create a list of rows first and then convert it to a pandas data frame. As qmeeus said on SO,\nPandas dataframes do not work as a list, they are much more complex data structures and appending is not really considered the best approach.\ndata = [] for row in some_function_that_yields_data(): data.append(row) # either this df = pd.DataFrame(data) # or this df = pd.concat([results, df], axis=0).reset_index(drop=True) Function Getting Help To see details of a function in a Jupyter notebook, use ?? operator. It will show you the body and the associated documentation (if available).\n??my_function() Modify Inplace When you modify a value passed to a function, it modifies that object inplace. For example, the following function would change the value.\nPython Environments See Managing environments for details. Use conda for managing environments and packages.\nGenerally, avoid messing up with base environment. Create a new one and use that.\nOnce you create a new environment, remember to refresh Visual Studio Code\u0026rsquo;s list of Python interpreters.\nList all environments conda env list Here is how it looks for me.\n(base) harshvardhan@harshmac17 ~ % conda env list # conda environments: # base * /Users/harshvardhan/opt/anaconda3 env_oct22 /Users/harshvardhan/opt/anaconda3/envs/env_oct22 Create an environment This will create an environment called new_env.\nconda create --name new_env Conda will ask you for permission proceed ([y]/n)?. Type y.\nActivate an environment This will activate the environment called new_env. Remember that you can list all environment with conda env list.\nconda activate new_env VS Code Note: By default, the Python extension looks for and uses the first Python interpreter it finds in the system path. To select a specific environment, use the Python: Select Interpreter command from the Command Palette (⇧⌘P).\nClone an environment This will create a new environment new_env with the exact same packages as old_env.\nconda create --clone old_env --name new_env Remove an environment This will remove a Python environment called old_env.\nconda env remove --name old_env List all packages This will list all packages in the current environment.\nconda list Update all packages This will update all packages in the activated Python environment. See above to list environments and select an environment. Once you are in the activated environment, execute the following.\nconda update --all Using pip in Conda This is not recommended. If you are feeling adventurous, give it a try! See Using pip in an environment. Also see a note on dependency conflicts.\n","permalink":"/notes-on-python/","summary":"Some things that I\u0026rsquo;ll likely forget","title":"Cheatsheet on Python"},{"content":"\nWhen I first saw Sirius Black\u0026rsquo;s family tree in Harry Potter: Prisoner of Azkaban many years ago, I thought of creating my own. I started mapping and went two generations back \u0026mdash; that was all that my family remembered. Mummy said, \u0026ldquo;People remember two generations up and two generations down. That\u0026rsquo;s it.\u0026rdquo; I was determined, but there was no obvious way.\nSome pundits in our ancestral village had a record of our family tree that we could access after paying a nominal fee. I was excited, but my father was not. \u0026ldquo;Too much travel for too little information\u0026rdquo;, he said. I never expected he might be correct.\nAround 30 million people have purchased DNA test kits to trace their ancestry. However, there are so many \u0026ldquo;fine print\u0026rdquo; details that it is almost impossible to know how accurate they are.\nThere are two ways to trace ancestry. First is genealogy, where you trace your family tree \u0026mdash; like I tried. You build a chart of your parents, grandparents and so on. The graph would grow at an exponential rate in a normal situation: \\(2^n\\). Currently, I am at \\(n = 0\\). It\u0026rsquo;s only me. A generation before me would be two people, \\(2^1\\); a generation before them would be four people, \\(2^2\\), and so on.\nThe second method is to track the genome. My genome constitutes 47 stretches of DNA: 23 pairs of chromosomes from each parent and one mitochondrial DNA from the mother. There are 118 DNA fragments that I got from Papa. When I track my ancestry this way, I get 189 pieces from my grandparents, 260 from my great grandparents and so on.1\nThese genome contributions look promising at first. One hundred eighteen stretches of DNA vs two genealogical parents; 189 stretches of DNA vs four genealogical parents. But what our human mind misses is that the first grows exponentially while the second grows linearly.\nWe have more information from our genealogy around the tenth generation than our DNA. By the fourteenth generation, the ratio is 16 to 1, meaning many of my ancestors didn\u0026rsquo;t pass me any DNA at all! So tell me, what do we know from DNA?\nConsidering my ancestors lived for 60 years on average, that gives me about a thousand years of genetic information from my DNA. But not all of that is reliable. We are 50% certain about our DNA results at just four generations down the line. Using DNA for my ancestry is worse than a coin toss just three hundred years down the line.\nWith a little extra effort and emotional pleading to Papa, I can get at least five or six generations worth of information. Why bother with DNA ancestry at all?\nIt is common knowledge DNA can store more information than modern computers. A curious soul would naturally ask: where did we lose this ancestral information? Partly because of mathematics of how DNA transfers happen and how much is retained in the child. Not all info is ancestral, and not all information is copied. It is also presumptuous to assume most of the information in our DNA is about ancestry. Nature doesn\u0026rsquo;t care about our parents; it cares about survival.\nBut there is a more interesting answer too. Our DNA gains permanent and non-permanent changes during our lifetime due to our behaviour. We can control what information is passed on to future generations. This recent discovery about epigenetics has revolutionised many aspects of genetic study.\nIf our behaviours can change the encoding in the fundamental unit of life, imagine the potentialities for the future generation.\nIf most of DNA information is lost after just a few generations, we should concentrate our efforts on maintaining our own genealogy directory. In the long run, physical directories retains more ancestral information than DNA. We should also take results of genome DNA kits with a pinch of salt.\nSo to answer the question, \u0026ldquo;how much can we know about ancestors?\u0026rdquo;. Well, probably not a lot.\nThis pages are reproduced from James Cheshire and Oliver Uberti\u0026rsquo;s Atlas of The Invisible: Maps and Graphs That Will Change How You See The World.\nPartial Inheritance: DNA kits do not tell the full story of you. Ancient DNA shows nationalism is only a state of mind. Much of this information follows from James Cheshire and Oliver Uberti\u0026rsquo;s Atlas of The Invisible: Maps and Graphs That Will Change How You See The World. His original comment is reproduced at the end for completion.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/ancestors/","summary":"Genome, Geneology and DNA Tests","title":"How much do we know about our ancestors?"},{"content":"On one fine day when I have enough time, they\u0026rsquo;ll all be wrapped into a package hosted on my Github. Until then, this page in their home.\nlibrary(tidyverse) ## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ── ## ✔ ggplot2 3.3.6.9000 ✔ purrr 0.3.4 ## ✔ tibble 3.1.7 ✔ dplyr 1.0.9 ## ✔ tidyr 1.2.0 ✔ stringr 1.4.1 ## ✔ readr 2.1.2 ✔ forcats 0.5.1 ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ## ✖ dplyr::filter() masks stats::filter() ## ✖ dplyr::lag() masks stats::lag() DT Table with Download Buttons with Search my_DT = function(df) { return(DT::datatable( df, extensions = \u0026#34;Buttons\u0026#34;, options = list( paging = TRUE, scrollX = TRUE, searching = TRUE, ordering = TRUE, dom = \u0026#39;Bfrtip\u0026#39;, buttons = c(\u0026#39;copy\u0026#39;, \u0026#39;csv\u0026#39;, \u0026#39;excel\u0026#39;, \u0026#39;pdf\u0026#39;), pageLength = 5, lengthMenu = c(3, 5, 10) ) )) } Show in Excel show_in_excel = function(.data) { temp = paste0(tempfile(), \u0026#34;.csv\u0026#34;) write.csv(.data, temp) fs::file_show(path = temp) } This can be used with pipes too.\niris %\u0026gt;% show_in_excel() Convert Missing Values to Zero This function converts missing values in a vector to zero.\nn2z = function(x) { x = ifelse(is.na(x), 0, x) return(x) } Finding Index of Missing Elements This function returns index of elements which are missing. Very useful in finding what observations are missing.\nwhich.na = function(x) { return(which(is.na(x))) } Removing Rows Based on Missing Values in a Column Sometimes, I do not want to na.omit() because it will treat all features equally. I want to check values only for one column.\nna.rm.feature = function(x, colname) { nas = which(is.na(x[,colname])) x = x[-nas, ] return(x) } Find row where a condition is satisfied This function can find observations that satisfy a condition. Typically, they are useful in finding specific elements. It is kind of a wrapper around dplyr\u0026rsquo;s filter().\nwhich.this = function(df, x) { df %\u0026gt;% filter(eval(parse(text = x))) } Example which.this(iris, \u0026#34;Sepal.Length \u0026gt; 6.5\u0026#34;) %\u0026gt;% head() ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 7.0 3.2 4.7 1.4 versicolor ## 2 6.9 3.1 4.9 1.5 versicolor ## 3 6.6 2.9 4.6 1.3 versicolor ## 4 6.7 3.1 4.4 1.4 versicolor ## 5 6.6 3.0 4.4 1.4 versicolor ## 6 6.8 2.8 4.8 1.4 versicolor Remove commas, dollars, or any other such characters The code below replaces all commas with nothing.\nx = \u0026#34;300,000\u0026#34; x = gsub(\u0026#34;,\u0026#34;, \u0026#34;\u0026#34;, x) print(x) ## [1] \u0026#34;300000\u0026#34; GGPlot2 Themes See official guide for more details. Also see Benjamin\u0026rsquo;s blog.\nThe default plot looks like this.\niris %\u0026gt;% ggplot(aes(x = Sepal.Length, y = Sepal.Width, colour = Species)) + geom_point() + labs(title = \u0026#34;Without my theme\u0026#34;) Once I run and set my theme, its way prettier.\n# creating theme theme_h = function(base_size = 14) { theme_bw(base_size = base_size) %+replace% theme( # Specify plot title plot.title = element_text( size = rel(1), face = \u0026#34;bold\u0026#34;, family = \u0026#34;serif\u0026#34;, margin = margin(0, 0, 5, 0), hjust = 0 ), # Specifying grid and border panel.grid.minor = element_blank(), panel.border = element_blank(), # Specidy axis details axis.title = element_text( size = rel(0.85), face = \u0026#34;bold\u0026#34;, family = \u0026#34;serif\u0026#34; ), axis.text = element_text(size = rel(0.70), family = \u0026#34;serif\u0026#34;), axis.line = element_line( color = \u0026#34;black\u0026#34;, arrow = arrow(length = unit(0.3, \u0026#34;lines\u0026#34;), type = \u0026#34;closed\u0026#34;) ), # Specify legend details legend.title = element_text( size = rel(0.85), face = \u0026#34;bold\u0026#34;, family = \u0026#34;serif\u0026#34; ), legend.text = element_text( size = rel(0.70), face = \u0026#34;bold\u0026#34;, family = \u0026#34;serif\u0026#34; ), legend.key = element_rect(fill = \u0026#34;transparent\u0026#34;, colour = NA), legend.key.size = unit(1.5, \u0026#34;lines\u0026#34;), legend.background = element_rect(fill = \u0026#34;transparent\u0026#34;, colour = NA), # Remove default background strip.background = element_rect(fill = \u0026#34;#17252D\u0026#34;, color = \u0026#34;#17252D\u0026#34;), strip.text = element_text( size = rel(0.85), face = \u0026#34;bold\u0026#34;, family = \u0026#34;serif\u0026#34;, color = \u0026#34;white\u0026#34;, margin = margin(5, 0, 5, 0) ) ) } theme_set(theme_h()) iris %\u0026gt;% ggplot(aes(x = Sepal.Length, y = Sepal.Width, colour = Species)) + geom_point() + labs(title = \u0026#34;With my theme\u0026#34;) I like the arrowed axes and serif fonts. This theme has now been implemented in my garlic package. It can be set via theme_set(garlic::gg_serif().\nTheme Clean There are many other alternatives available \u0026mdash; beyond the default options. This website has a wonderful compilation of a few of them. I really like theme_clean() from ggthemes package.\ntheme_set(ggthemes::theme_clean()) Tech Themes ggtech has themes related to tech companies. Here are they in the order of my preference.\ntheme_set(ggtech::theme_airbnb_fancy()) theme_set(ggtech::theme_tech(theme=\u0026#34;etsy\u0026#34;)) theme_set(ggtech::theme_tech(theme=\u0026#34;google\u0026#34;)) theme_set(ggtech::theme_tech(theme=\u0026#34;facebook\u0026#34;)) theme_set(ggtech::theme_tech(theme=\u0026#34;twitter\u0026#34;)) Better Quality Images in R Markdown Using .svg as the image output format gives much better graphics quality than the default option. To use that, include the following code in R Markdown. Source.\n# set output device to svg # this can fail sometimes -- I haven\u0026#39;t investigated when knitr::opts_chunk$set(dev = \u0026#39;svg\u0026#39;) Update (March 5, 2022): I finally wrote a package with some of these functions. You can learn more about it here.1\nI do not imagine this package to be useful to many people but I use these functions very frequently. Particularly my ggplot2 theme.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/notes-on-r/","summary":"Functions et al","title":"Notes on R"},{"content":"Apple certainly improves how productive you are. I have lost an uncountable number of hours (cumulatively) on switching on my Windows machine alone \u0026mdash; something that\u0026rsquo;s almost instantaneous in Mac. I am not entirely satisfied with iPhones, but I\u0026rsquo;m not sure if that\u0026rsquo;s due to my muscle memory about how androids work or if it\u0026rsquo;s iPhone. Here are specific applications on Mac that significantly boost my productivity \u0026mdash; even for rare cases.\nStatistical computing requires a powerful machine. A powerful processor and RAM are required to process most high-level tasks these days. Thankfully, my Mac can handle most of it. I\u0026rsquo;m happy and sad that machine learning isn\u0026rsquo;t considered a game-changer as it used to be, although it is still revolutionary. Probably it is just another example of the Gartner Hype Cycle.\nAmphetamine Keep awake utility Amphetamine helps me close my Mac without stopping the computations. It gives me an option to switch off displays \u0026mdash; which I do \u0026mdash; while running the machine in the background. So, I can put a lengthy computation, close my laptop, carry it wherever. Later in the day, I have my results ready when I get back to my Mac. It\u0026rsquo;s best used with Amphetamine Enhancer.\nApp Store\nCheatSheet Hold Command (⌘) key to see all keyboard shortcuts The trackpad is excellent, but keyboard shortcuts are the best. My high school computer teacher, Pathak Sir, used to say: \u0026ldquo;Programmers use the keyboard. Accountants use the mouse.\u0026rdquo; His words stuck with me; I kept longing for more keyboard shortcuts. Saving minutes here and there saved me hours overall, and I\u0026rsquo;m not counting all the mental peace.\nLink\nDropbox File storage and backup Google Drive is free and has more space. But why do I still use Dropbox? Because it always works. It is simple enough to be used for all files. Backing up my research work is effortless. It only gives two gigabytes of storage, but I\u0026rsquo;ve only hit the limit once \u0026mdash; that too when I was storing almost all files on my machine.\nGestimer Pull down and create a timer It is simple. Pull down an icon from the menu bar to set up a reminder. How much you pull determines the reminder time. I think it is a good deal for the price of a coffee.\nApp Store\nGifski Easily convert videos to GIFs Since I started using GIFs in my newsletter, I needed to convert my screen recordings to GIFs. Gifski is a free tool that neat and does the job quite fast.\nApp Store\nIINA Media player The VLC media player is great \u0026mdash; no complaints. I like IINA mostly because it can automatically download subtitles of what I\u0026rsquo;m watching.\nGithub\nItsycal Menu bar calendar I\u0026rsquo;ll admit I do not use this often. The app adds an option in your menu bar to show you the following two days events (customisable). If there is an associated Zoom or Google Meet link, it\u0026rsquo;ll show a button alongside as well. On rare times that I do use it (like when I\u0026rsquo;m sharing my screen on Zoom and don\u0026rsquo;t want to pause sharing by opening my Calendar), I find it incredibly useful.\nWebsite\nLunar Brightness controller for external displays Lunar gives you the option to control the brightness of external displays. If your eyes are tired of too much light, consider trying this app before visiting a doctor.\nWebsite\nPandan Time awareness tool It tells me to take a break without telling me to take a break. It shows me how long I have been staring at the screen continuously (right now, it\u0026rsquo;s 4h 26m) and leaves the decision to take breaks. I have found this more effective than Pomodoro and ilk, probably because I feel I have the controls.\nApp Store\nRaycast Replacement of Spotlight Spotlight is great. Alfred added some additional features to improve it. Raycast added many more and made it free and open source. Mini-apps allow you to start a Zoom meeting, search and play a song on Spotify, see Github issues, and many more. The clipboard history tool is terrific too.\nWebsite\nRectangle Move and resize windows Rectangle is a super helpful tool for resizing windows. There\u0026rsquo;s really not a lot to explain. There\u0026rsquo;s also a paid alternative with Trackpad gestures that I\u0026rsquo;ve not tried.\nGithub\nSpark Email client I prefer Spark over Mail because it is faster and allows me to undo sending an email. The way it handles email signatures is also better than how Mail does it. It shows me all options, not only those associated with an email address.\nWebsite\nUnarchiver Archive tool You need a file that can decompress .rar files (or literally anything other than .zip).\nWebsite\n","permalink":"/my-mac-workflow/","summary":"Apps that take me from 0 to 1","title":"My Mac Workflow"},{"content":"Text in R can be represented in several ways but generally it is a character vector (strings). Reading a text file would mean most of the content would either be in a single long character file, or broken into several variables and observations as a data frame like comma separated files (CSV). In this blog tutorial, I will download a Jane Austen\u0026rsquo;s book and perform some basic analysis to understand how these text functions work.\nPackages The common packages for text mining in R are stringr, tidytext, tidyverse and quanteda. I will also use gutenbergr to download the book for analysis.\nlibrary(stringr) library(tidyverse) ## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ── ## ✔ ggplot2 3.3.6.9000 ✔ purrr 0.3.4 ## ✔ tibble 3.1.7 ✔ dplyr 1.0.9 ## ✔ tidyr 1.2.0 ✔ forcats 0.5.1 ## ✔ readr 2.1.2 ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ## ✖ dplyr::filter() masks stats::filter() ## ✖ dplyr::lag() masks stats::lag() library(tidytext) library(quanteda) ## Package version: 3.2.1 ## Unicode version: 14.0 ## ICU version: 70.1 ## Parallel computing: 8 of 8 threads used. ## See https://quanteda.io for tutorials and examples. library(gutenbergr) # changing default ggplot theme to minimal theme_set(theme_minimal()) Downloading the Book Once I have the required functions in my namespace, I can download the book using gutenberg_download(). gutenberg_works() gives a list of works that can be downloaded. (gutenberg_metadata will give a list of all books in Project Gutenberg, but we only need the ones that can be downloaded.)\ngutenberg_works(title == \u0026#34;Persuasion\u0026#34;) ## # A tibble: 1 × 8 ## gutenberg_id title author gutenberg_autho… language gutenberg_books… rights ## \u0026lt;int\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;int\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; ## 1 105 Persuas… Auste… 68 en \u0026lt;NA\u0026gt; Publi… ## # … with 1 more variable: has_text \u0026lt;lgl\u0026gt; I am looking for Persuasion, Jane Austen\u0026rsquo;s last book. R tells me the rights to the book are public and it has text, so works for my purpose. Downloading the book requires its gutenberg_id, which is 105 for Persuasion, as seen in previous output.\nbook = gutenberg_download(105) ## Determining mirror for Project Gutenberg from http://www.gutenberg.org/robot/harvest ## Using mirror http://aleph.gutenberg.org ## Warning in .f(.x[[i]], ...): Could not download a book at http:// ## aleph.gutenberg.org/1/0/105/105.zip ## Warning: Unknown or uninitialised column: `text`. I can download more than one books at a time and many other fancy things. Check gutenbergr\u0026rsquo;s vignette for more information.\nExploring the Book Let\u0026rsquo;s see what we have in book.\nbook ## # A tibble: 0 × 2 ## # … with 2 variables: gutenberg_id \u0026lt;int\u0026gt;, text \u0026lt;chr\u0026gt; The book object has two variables: gutenberg_id and text. Unless you are downloading multiple books, text is the only useful variable.\nAlso note that there are 8,328 rows in the dataset. However, this text is not in tidytext format, where each row identifies a token and each column is a variable. (An easy way to remember the format is to repeat out loud \u0026ldquo;One Token Per Document Per Row\u0026rdquo; as often as you can.)\nTo convert it into tidytext format, I will use unnest_tokens() function from tidytext package.\nbook %\u0026gt;% unnest_tokens(word, text) ## # A tibble: 0 × 2 ## # … with 2 variables: gutenberg_id \u0026lt;int\u0026gt;, word \u0026lt;chr\u0026gt; unnest_tokens used here has two parameters: what you want to convert into and what you want to convert. First we have the output column name that will be created as the text is unnested into it (word, in this case), and then the input column that the text comes from (text, in this case).\nThe function also did some other operations in the background. It removed all the punctuation marks from the text. It also converted everything to lower case (which can be toggled OFF by using to_lower = FALSE in unnest_tokens. The function also has an argument token to specify what kind of text is it. words is the default option that worked for our case. Other options are characters, character_shingles, ngrams, skip_ngrams, sentences, lines, paragraphs, regex, tweets and ptb.\nExploring Words We can look for several manipulations for insights about the words. Such as, how many four letter words did she use? Less than four letter words? Longer than ten letter words?\nbook = book %\u0026gt;% unnest_tokens(word, text) # Four Letter Words book %\u0026gt;% filter(str_length(word) == 4) ## # A tibble: 0 × 2 ## # … with 2 variables: gutenberg_id \u0026lt;int\u0026gt;, word \u0026lt;chr\u0026gt; # Less than four letter words book %\u0026gt;% filter(str_length(word) \u0026lt; 4) ## # A tibble: 0 × 2 ## # … with 2 variables: gutenberg_id \u0026lt;int\u0026gt;, word \u0026lt;chr\u0026gt; # More than ten letters book %\u0026gt;% filter(str_length(word) \u0026gt; 10) ## # A tibble: 0 × 2 ## # … with 2 variables: gutenberg_id \u0026lt;int\u0026gt;, word \u0026lt;chr\u0026gt; We see that there are 15,505 words that have exactly four letters. 37,908 have less than four letters (that includes numbers such as 1). There are 1,636 words that have more than ten letters in them.\nWords that Start or End with \u0026hellip; We can also find words that start or end with a particular string. For example, I wonder how often does Jane Austen use V4 form of the verb \u0026mdash; ending in \u0026ldquo;ing\u0026rdquo;? We can use str_ends() from stringr package.\nbook %\u0026gt;% filter(str_ends(word, \u0026#34;ing\u0026#34;)) ## # A tibble: 0 × 2 ## # … with 2 variables: gutenberg_id \u0026lt;int\u0026gt;, word \u0026lt;chr\u0026gt; She uses 2,638 words that end with \u0026ldquo;ing\u0026rdquo;. I\u0026rsquo;m curious, what are their frequencies? I only need to add the count() at the end.\nbook %\u0026gt;% filter(str_ends(word, \u0026#34;ing\u0026#34;)) %\u0026gt;% count(word, sort = T) ## # A tibble: 0 × 2 ## # … with 2 variables: word \u0026lt;chr\u0026gt;, n \u0026lt;int\u0026gt; \u0026ldquo;Being\u0026rdquo; and \u0026ldquo;nothing\u0026rdquo; are the most often used (no pun intended). What about words that start with \u0026ldquo;h\u0026rdquo;? I can use str_starts() from stringr package for this.\nbook %\u0026gt;% filter(str_starts(word, \u0026#34;h\u0026#34;)) %\u0026gt;% count(word, sort = T) ## # A tibble: 0 × 2 ## # … with 2 variables: word \u0026lt;chr\u0026gt;, n \u0026lt;int\u0026gt; They\u0026rsquo;re mostly pronouns. How many times does \u0026ldquo;gh\u0026rdquo; appear in her texts and in which words? (If I recall correctly, \u0026ldquo;gh\u0026rdquo; is probably one of the most common letter-pair in English.)\nbook %\u0026gt;% filter(str_detect(word, fixed(\u0026#34;gh\u0026#34;))) %\u0026gt;% count(word, sort = T) ## # A tibble: 0 × 2 ## # … with 2 variables: word \u0026lt;chr\u0026gt;, n \u0026lt;int\u0026gt; I did this using str_detect() function from stringr. This function usually looks for regular expressions. Since there was a fix string that I was looking for (gh), I used fixed() to tell R exactly what I was looking for. It will not make pattern matches but only exact fixed matches. I\u0026rsquo;m very naive in handling regular expressions but the starting guide could be Hadley Wickham\u0026rsquo;s R for Data Science chapter on Strings.\nI can also look for words that start with a certain letter(s) and end with certain letter(s). How? Just add another condition in the filter() statement. Let\u0026rsquo;s look for words Jane used that start and end with \u0026ldquo;t\u0026rdquo;.\nbook %\u0026gt;% filter(str_starts(word, \u0026#34;t\u0026#34;) \u0026amp; str_ends(word, \u0026#34;t\u0026#34;)) %\u0026gt;% count(word, sort = T) ## # A tibble: 0 × 2 ## # … with 2 variables: word \u0026lt;chr\u0026gt;, n \u0026lt;int\u0026gt; The most common such word is \u0026ldquo;that\u0026rdquo;, followed by \u0026ldquo;thought\u0026rdquo;.\nFrequency Distribution Plots We saw how adding count(word, sort = T) created the frequency distribution. We can also visualise the counts.\nFrequency Table book %\u0026gt;% count(word, sort = T) %\u0026gt;% head(20) %\u0026gt;% mutate(word = reorder(word, n)) ## # A tibble: 0 × 2 ## # … with 2 variables: word \u0026lt;fct\u0026gt;, n \u0026lt;int\u0026gt; Frequency Plot I will have to reorder the counts for creating the plot as count() only counts and doesn\u0026rsquo;t change the order of the tibble.\nbook %\u0026gt;% count(word, sort = T) %\u0026gt;% head(20) %\u0026gt;% mutate(word = reorder(word, n)) %\u0026gt;% ggplot(aes(x = n, y = word)) + geom_col() + xlab(\u0026#34;Count\u0026#34;) + ylab(\u0026#34;Word\u0026#34;) Finding Hapaxes Hapaxes are words that occur only once in the text. Nothing complicated; I will first count the occurrences and then filter when the count is 1.\nbook %\u0026gt;% count(word, sort = T) %\u0026gt;% filter(n == 1) ## # A tibble: 0 × 2 ## # … with 2 variables: word \u0026lt;chr\u0026gt;, n \u0026lt;int\u0026gt; These are all numbers. What about words?\nbook %\u0026gt;% count(word, sort = T) %\u0026gt;% filter(n == 1) %\u0026gt;% filter(!str_detect(word, \u0026#34;[0-9]\u0026#34;)) ## # A tibble: 0 × 2 ## # … with 2 variables: word \u0026lt;chr\u0026gt;, n \u0026lt;int\u0026gt; I have used regular expression here to identify all the words that didn\u0026rsquo;t have any numerals.\nDistribution of Word Lengths Some writers have a habit of writing long words. What were the longest words used by Jane and how often did she use them?\nbook %\u0026gt;% mutate(length = str_length(word)) %\u0026gt;% count(length, sort = T) ## # A tibble: 0 × 2 ## # … with 2 variables: length \u0026lt;int\u0026gt;, n \u0026lt;int\u0026gt; Three letter words are most commonly used, followed by four letter and two letter ones. I have first calculated the length of words using mutate() and str_length().\nI can also plot them.\nbook %\u0026gt;% mutate(length = str_length(word)) %\u0026gt;% count(length, sort = T) %\u0026gt;% mutate(length = reorder(length, n)) %\u0026gt;% ggplot(aes(x = length, y = n)) + geom_col() + xlab(\u0026#34;Length of Word\u0026#34;) + ylab(\u0026#34;Count\u0026#34;) That was all! See you in next week when I try some harder text analysis tools.\nP.S. I have used the words \u0026ldquo;word(s)\u0026rdquo; and \u0026ldquo;token(s)\u0026rdquo; quite liberally. They are not always the same. As token argument in unnest_token informs, there are many options besides words that can be tokens.\n","permalink":"/basics-of-text-mining-in-r/","summary":"Thinking of Text as List of Words","title":"Basics of Text Mining in R"},{"content":" RStudio Using Github is the popular method of version control in RStudio. I have struggled with managing Github with RStudio. In this section, I will put together pieces from different sources to fix your issues or get started with it.\nStarting a New Project Step 1: Create a Github Repository Go to github.com and create a new repository. Copy it\u0026rsquo;s link.\nIt could be private or public depending on your whims. Public repositories are public \u0026mdash; other users can see and fork it. Forking is a fancy term for copying to their Github. Private repositories are seen only to you. I\u0026rsquo;d also recommend to add a README file. Project description goes here. Once you create your repository, copy the link to it. Not SSH link but link visible in your browser.\nStep 2: Create a New R Project In RStudio, head over to File -\u0026gt; New Project. In that, select the option for version control (git). Paste the link to your newly created repository.\nOnce you do that, the newly created project will be synced with Github repository. You can make changes and then \u0026ldquo;commit\u0026rdquo; to your online repository.\nStep 3: Commit Changes! The last step is to create files that you want.\nFinally, commit the changes.\nIn the top right corner of RStudio, tick all the changes you want to commit. Then add a message to the \u0026ldquo;commit message\u0026rdquo; box in the top right corner. This message is a record of what changes you did. (My advice is to use this to explain your intent rather than explain the changes. Changes are trivial to trace thanks to Git, but \u0026ldquo;why\u0026rdquo; is easy to forget.)\nYou\u0026rsquo;ll have the results available (almost) immediately.\nHandling Passwords If you are starting a project, jump to this. If you had a project that used password for commits (discontinued today, Aug 13 2021), jump to the first article. If it still does not resolve your issues, try the next three in that order.\nSimplest Guide on Github with R This literally solves more than 80 per cent of my problems. Undo git commit in RStudio for more than 100 mb files Every once in a while, you will end up trying to upload something in your local files that you do not want to commit to Github repository. How do you release that commit from R Studio? Github has a hard limit of 100 mb per file and I occasionally fail trying to upload it. Analysis paralysis.\nTL;DR Ensure you only have commits that you want to delete in the Git pane. Then go to Git tab. Click on (machine-like icon) and select \u0026ldquo;Shell\u0026hellip;\u0026rdquo;. In Shell, type: git reset HEAD~. Keep hitting the statement as many times as you have commits. Voila. Full Steps First, ensure you have only the changes you do not want to commit. If I try commiting with .key file, it will fail as it\u0026rsquo;s larger than 100 mb.\nI want to commit the last two changes (something about this blog post), so I will commit it and push changes to the repository. Then, I will go the (machine-like icon) and click on \u0026ldquo;Shell\u0026hellip;\u0026rdquo;.\nNow, I will paste in the Terminal window the magic spell: git reset HEAD~.\nPython Use nbdime for diffing and merging Jupyter notebooks. Github desktop app is soon going to use it as default diff-tool for Jupyter notebooks.1\nIf you have a better suggestion on how to handle Python notebooks in Github, I\u0026rsquo;m all ears.\nSome additional resources https://rfortherestofus.com/2021/02/how-to-use-git-github-with-r/\nhttps://happygitwithr.com/https-pat.html#store-pat\nhttps://github.blog/2020-12-15-token-authentication-requirements-for-git-operations/\nIt annoys me that .ipynb is not plain text. Making it a plain text file would make it much easier to handle. Plus, we won\u0026rsquo;t require an additional tool like JupyterLab/JupyterHub for viewing or editing. Why didn\u0026rsquo;t they make it a plaintext file? I might be technically wrong on this one but one benefit might be plugins but they can easily be delivered as notebook extensions like in RStudio visual editor.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/notes-on-github/","summary":"Tricks of trade I learnt along the way","title":"Notes on Github"},{"content":"Animated Slides: Keynote\nStatic Slides: PDF\nSession Recording: YouTube\nI also demonstrate Owlstown\u0026rsquo;s tool \u0026mdash; which I found to be most beginner friendly. It starts at 27:00 in the above YouTube video.\nWebsites used to be developed by groups of people to meet the needs of other groups of people. Today, as the internet grows more personalised than an encyclopedia of information, I argue we need more personal websites. Social media platforms are limited and occupational in treating your content. Your message might be curtailed by what LinkedIn allows or 280 characters on Twitter. Contrary to what many think, maintaining a personal website is neither difficult nor expensive. Unfortunately, creating a website is approached as a \u0026ldquo;technology problem\u0026rdquo; to be solved. Projects are coloured from the beginning by enthusiasms for or fear for HTML, CSS and other fancy jargon \u0026mdash; when it doesn\u0026rsquo;t have to be so.\nI was thrilled to present this talk at the Trenton R Users group. Generally speaking, I discussed ideas on why having a personal website is critical and how one can create it easily.\nPoster @harshbutjust gave a talk for academics on creating and controlling your digital identity.\nVideo: https://t.co/5hxCGhiRtZ\nDetails with links to slides and PDF: https://t.co/EasqyZfIfA#AcademicChatter #AcademicTwitter @PhDVoice\n\u0026mdash; Owlstown - Academic Website Builder (@owlstown) February 4, 2022 ","permalink":"/iweb/","summary":"Slides, recorded lecture and additional resources around my talk on how to create and control your digital identity.","title":"I Web, Therefore I Exist"},{"content":"Most of the time when I answer questions on Stack Overflow, I end up learning a thing or two about R myself. Answering questions gets me warmed up on unrelated topics. These questions are one of their kind. This blog post documents all my answers so that I can find the answers readily.\nThe headlines explain the basic logic of what I am trying to achieve through that question.\nVisualisation This question was on combining time series plots using par(). par() won\u0026rsquo;t work because plot.decomposed.ts() (which we implicitly call when calling plot()) isn\u0026rsquo;t designed to work that way. The most straightforward alternative is to use autoplot() from the forecast package to generate decomposition plots and combine them using patchwork.\nHere is an example.\n# Loading forecast and patchwork library(forecast) ## Registered S3 method overwritten by \u0026#39;quantmod\u0026#39;: ## method from ## as.zoo.data.frame zoo library(patchwork) m1 = decompose(co2) m2 = decompose(AirPassengers) m3 = decompose(UKgas) p1 = autoplot(m1) p2 = autoplot(m2) p3 = autoplot(m3) p1 / p2 / p3 The last line, p1 / p2 / p3, tells R to stack them vertically. If you want to stack them horizontally, use p1 + p2 + p3. If you\u0026rsquo;re being feisty, you can also try (p1 + p2)/p3 to stack the first two horizontally and the last one beneath it.\nUsing gghighlight to highlight a line plot in ggplot2 gghighlight provides gghighlight() which can be used to selectively highlight some lines, points or other geom_. I couldn\u0026rsquo;t get the dataset in question working, so I generated a random dataset. The code should work for their case as well.\nlibrary(gghighlight) ## Loading required package: ggplot2 year = 1970:2020 value = rnorm(length(year), 2000, 5) x = c(\u0026#34;A\u0026#34;, \u0026#34;B\u0026#34;, \u0026#34;C\u0026#34;, \u0026#34;D\u0026#34;, \u0026#34;E\u0026#34;) variable = sample(x, length(year), replace = T) df = data.frame(year = year, value = value, variable = variable) Now is the cool part.\ndf %\u0026gt;% ggplot(aes(x = year, y = value, colour = variable)) + geom_line() + gghighlight(variable == \u0026#34;A\u0026#34;) + theme_minimal() ## Warning: Tried to calculate with group_by(), but the calculation failed. ## Falling back to ungrouped filter operation... ## Warning: Using `across()` in `filter()` is deprecated, use `if_any()` or ## `if_all()`. ## label_key: variable Voila!\nHistogram with ggplot2 I had to clarify several things. First, what made them choose bins = 43? Second, is providing the scale manually necessary? If the data is in the right format, they should not need it. If it isn\u0026rsquo;t, they should first transform the data and then do it.\nThird, gray background (which they wanted to change) is from the default theme. There are several options but I like minimal. minimal, linedraw and bw have white grids.\n(I\u0026rsquo;m generating 1000 random numbers for this example.)\nlibrary(tidyverse) ## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ── ## ✓ tibble 3.1.6 ✓ dplyr 1.0.7.9000 ## ✓ tidyr 1.1.4 ✓ stringr 1.4.0 ## ✓ readr 2.0.2 ✓ forcats 0.5.1 ## ✓ purrr 0.3.4 ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() v = tibble(var = rnorm(1000)) ggplot(v, aes(x = var)) + geom_histogram(bins = 20) + theme_minimal() Barplots grouped by two (factor) variables and plotting mean values as points on them The data they posted didn\u0026rsquo;t work so I\u0026rsquo;ve used (modified) iris dataset.\n# loading tidyverse library(tidyverse) # adding another factor variable to replicate this example iris$Variable = rep(LETTERS[1:5], times = 30) Here\u0026rsquo;s the meat.\niris %\u0026gt;% ggplot(aes(x = Species, y = Sepal.Length, fill = Variable)) + geom_boxplot() + stat_summary( fun = mean, color = \u0026#34;steelblue\u0026#34;, position = position_dodge(0.75), geom = \u0026#34;point\u0026#34;, shape = 20, size = 5, show.legend = FALSE ) + theme_minimal() String Manipulations Selecting rows where a string match occurs We can use grepl() from base R for this. grepl() returns True if the word is present and False otherwise.\ntext = \u0026#34;The Little Vanities of Mrs. Whittaker: A Novel\u0026#34; word = \u0026#34;Novel\u0026#34; grepl(word, text) ## [1] TRUE The original_books file (in question) will require large downloads so I\u0026rsquo;m showing an example of searching \u0026ldquo;Plays\u0026rdquo; in title.x of their novels data frame.\nlibrary(gutenbergr) library(tidyverse) gutenberg_full_data \u0026lt;- left_join(gutenberg_works(language == \u0026#34;en\u0026#34;), gutenberg_metadata, by = \u0026#34;gutenberg_id\u0026#34;) gutenberg_full_data \u0026lt;- left_join(gutenberg_full_data, gutenberg_subjects) ## Joining, by = \u0026#34;gutenberg_id\u0026#34; gutenberg_full_data \u0026lt;- subset( gutenberg_full_data, select = -c( rights.x, has_text.x, language.y, gutenberg_bookshelf.x, gutenberg_bookshelf.y, rights.y, has_text.y, gutenberg_bookshelf.y, gutenberg_author_id.y, title.y, author.y ) ) gutenberg_full_data \u0026lt;- gutenberg_full_data[-which(is.na(gutenberg_full_data$author.x)), ] novels \u0026lt;- gutenberg_full_data %\u0026gt;% filter(subject == \u0026#34;Drama\u0026#34;) Here comes the cool part.\nnovels %\u0026gt;% mutate(contains_play = grepl(\u0026#34;Plays\u0026#34;, title.x)) %\u0026gt;% as.data.frame() %\u0026gt;% head() ## gutenberg_id title.x ## 1 1308 A Florentine Tragedy; La Sainte Courtisane ## 2 2270 Shakespeare\u0026#39;s First Folio ## 3 2587 Life Is a Dream ## 4 4970 There Are Crimes and Crimes ## 5 5053 Plays by August Strindberg: Creditors. Pariah. ## 6 5618 Six Plays ## author.x gutenberg_author_id.x language.x ## 1 Wilde, Oscar 111 en ## 2 Shakespeare, William 65 en ## 3 Calderón de la Barca, Pedro 970 en ## 4 Strindberg, August 1609 en ## 5 Strindberg, August 1609 en ## 6 Darwin, Florence Henrietta Fisher, Lady 1814 en ## subject_type subject contains_play ## 1 lcsh Drama FALSE ## 2 lcsh Drama FALSE ## 3 lcsh Drama FALSE ## 4 lcsh Drama FALSE ## 5 lcsh Drama TRUE ## 6 lcsh Drama TRUE Note that grepl() allows the second argument to be a vector. Thus, using rowwise() is not necessary. If it allowed searching only within a string, we would have to use rowwise().\nData Wrangling and Manipulation Typecasting to Numeric The variable born is registered as a character variable. Convert it to numeric and one should be good to go.\ndat1$born = as.numeric(dat1$born) Now compute the age difference.\nGrabbing columns from one data frame with variable names from another data frame One can do it using any_of() function from dplyr. It selects the variables which match the names and ignores those which do not. I will use a list to store matrices from the loop. They can be accessed using df_modified[[i]].\ndf1=data.frame(q1 = c(1:3), q2 = c(\u0026#34;One\u0026#34; , \u0026#34;Two\u0026#34; , \u0026#34;Three\u0026#34;) , q3 = c(100,231,523), q4 = c(\u0026#34;red\u0026#34;, \u0026#34;green\u0026#34;, \u0026#34;blue\u0026#34;), q1.2 = c(20:22), q2.2 = c(\u0026#34;Six\u0026#34; , \u0026#34;Ten\u0026#34; , \u0026#34;Twenty\u0026#34;) , q3.2 = c(5,900,121), q4.2 = c(\u0026#34;purple\u0026#34;, \u0026#34;yellow\u0026#34;, \u0026#34;white\u0026#34;)) df2=data.frame(x1 = c(\u0026#34;q1\u0026#34; , \u0026#34;q2.1\u0026#34; , \u0026#34;q3.2\u0026#34; , \u0026#34;q4.2\u0026#34;) , x2 = c(\u0026#34;q2\u0026#34; , \u0026#34;q3\u0026#34; , \u0026#34;q3.3\u0026#34; , \u0026#34;q4.4\u0026#34;) , x3 = c(\u0026#34;q3\u0026#34; , \u0026#34;q2.4\u0026#34; , \u0026#34;q3.3\u0026#34; , \u0026#34;q4.6\u0026#34;), x4 = c(\u0026#34;q4\u0026#34; , \u0026#34;q3.6\u0026#34; , \u0026#34;q3.3\u0026#34; , \u0026#34;q4.2\u0026#34;)) # Loading libraries library(tidyverse) df_modified = list() for(i in 1:nrow(df2)) { vars = as.character(df2[i,]) df_modified[[i]] = df1 %\u0026gt;% select(any_of(vars)) } Output\ndf_modified ## [[1]] ## q1 q2 q3 q4 ## 1 1 One 100 red ## 2 2 Two 231 green ## 3 3 Three 523 blue ## ## [[2]] ## q3 ## 1 100 ## 2 231 ## 3 523 ## ## [[3]] ## q3.2 ## 1 5 ## 2 900 ## 3 121 ## ## [[4]] ## q4.2 ## 1 purple ## 2 yellow ## 3 white Done!\nSelecting Observations by Filtering Other Variables One approach is to write a function that does that for you. It matches the first three variables with what you input and returns the index(or indexes) of elements that match.\nwhich() returns the index of items that satisfy the condition. When I say which(df[,1] == a), it will return me the index of observations in df where the first column matches a. And so on. Then, you can use intersect() to find the common indexes in x1, x2 and x3. I\u0026rsquo;m using magrittr pipes %\u0026gt;% to simplify the coding.\ncheck_this = function(df, a, b, c) { x1 = which(df[,1] == a) x2 = which(df[,2] == b) x3 = which(df[,3] == c) v = intersect(x1, x2) %\u0026gt;% intersect(x3) return(v) } Minimum Working Example First, I\u0026rsquo;ll create a dummy data frame. Then, I\u0026rsquo;ll find the index using the function I just created.\ndf = tibble(var1 = 1:10, var2 = 11:20, var3 = letters[1:10], var4 = LETTERS[1:10]) df ## # A tibble: 10 × 4 ## var1 var2 var3 var4 ## \u0026lt;int\u0026gt; \u0026lt;int\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; ## 1 1 11 a A ## 2 2 12 b B ## 3 3 13 c C ## 4 4 14 d D ## 5 5 15 e E ## 6 6 16 f F ## 7 7 17 g G ## 8 8 18 h H ## 9 9 19 i I ## 10 10 20 j J Now, let us see it in action. First, I\u0026rsquo;ll pass the data frame and variables I want to match as arguments. The function will return the indices which I\u0026rsquo;ll store in l. Then, I\u0026rsquo;ll ask R to show me the rows which have indices numbers in l.\n# checking and storing the index of matched l = check_this(df, 2, 12, \u0026#34;b\u0026#34;) df[l,] ## # A tibble: 1 × 4 ## var1 var2 var3 var4 ## \u0026lt;int\u0026gt; \u0026lt;int\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; ## 1 2 12 b B Note: You could have skipped the step of storing indices in l by returning the selected rows of the data frame itself. The function would change to the following.\n# the function check_this = function(df, a, b, c) { x1 = which(df[,1] == a) x2 = which(df[,2] == b) x3 = which(df[,3] == c) v = intersect(x1, x2) %\u0026gt;% intersect(x3) return(df[v,]) } Convert a vector of strings in multiple formats into dates in R My Solution # sample date dates \u0026lt;- c(\u0026#34;2015-02-23\u0026#34;,\u0026#34;2015-02-12\u0026#34;,\u0026#34;2015-18-02\u0026#34;,\u0026#34;2015-25-02\u0026#34;) # libraries library(testit) #for has_warning library(lubridate) #for date functions ## ## Attaching package: \u0026#39;lubridate\u0026#39; ## The following objects are masked from \u0026#39;package:base\u0026#39;: ## ## date, intersect, setdiff, union This function will correct the dates.\ncorrect_dates = function(dates) { dates_new = character() for(i in 1:length(dates)) { #print(i) if(has_warning(day(ydm(dates[i]))\u0026gt;12)) {dates_new = append(dates_new, ymd(dates[i]))} else {dates_new = append(dates_new, ydm(dates[i]))} } return(dates_new) } Let\u0026rsquo;s see it in action.\ndates ## [1] \u0026#34;2015-02-23\u0026#34; \u0026#34;2015-02-12\u0026#34; \u0026#34;2015-18-02\u0026#34; \u0026#34;2015-25-02\u0026#34; correct_dates(dates) ## [1] \u0026#34;2015-02-23\u0026#34; \u0026#34;2015-12-02\u0026#34; \u0026#34;2015-02-18\u0026#34; \u0026#34;2015-02-25\u0026#34; Much Better Solution dates \u0026lt;- c(\u0026#34;2017-12-31\u0026#34;,\u0026#34;2017-12-30\u0026#34;,\u0026#34;2017-29-12\u0026#34;,\u0026#34;2017-28-12\u0026#34;) as.Date(lubridate::parse_date_time(dates, c(\u0026#39;ymd\u0026#39;, \u0026#39;ydm\u0026#39;))) ## [1] \u0026#34;2017-12-31\u0026#34; \u0026#34;2017-12-30\u0026#34; \u0026#34;2017-12-29\u0026#34; \u0026#34;2017-12-28\u0026#34; Table Formatting in kableExtra How to append two tables with same number of columns in kableExtra? I don\u0026rsquo;t know how to combine the tables directly without first joining the data frames. However, using pack_rows to specify rows for grouping together should work for this purpose.\nlibrary(kableExtra) ## ## Attaching package: \u0026#39;kableExtra\u0026#39; ## The following object is masked from \u0026#39;package:dplyr\u0026#39;: ## ## group_rows df1 = data.frame(x = c(\u0026#34;a\u0026#34;,\u0026#34;b\u0026#34;), y=1:2) df2 = data.frame(x = c(\u0026#34;c\u0026#34;,\u0026#34;d\u0026#34;), y=3:4) rbind(df1, df2) %\u0026gt;% kbl(format = \u0026#34;latex\u0026#34;, caption = \u0026#34;Combined Tables\u0026#34;) %\u0026gt;% kable_paper(\u0026#34;striped\u0026#34;, full_width = F) %\u0026gt;% pack_rows(\u0026#34;Header 1\u0026#34;, 1, 2) %\u0026gt;% pack_rows(\u0026#34;Header 2\u0026#34;, 3, 4) \\begin{table}\n\\caption{(#tab:unnamed-chunk-21)Combined Tables} \\centering \\begin{tabular}[t]{l|r} \\hline x \u0026amp; y\\ \\hline \\multicolumn{2}{l}{\\textbf{Header 1}}\\ \\hline \\hspace{1em}a \u0026amp; 1\\ \\hline \\hspace{1em}b \u0026amp; 2\\ \\hline \\multicolumn{2}{l}{\\textbf{Header 2}}\\ \\hline \\hspace{1em}c \u0026amp; 3\\ \\hline \\hspace{1em}d \u0026amp; 4\\ \\hline \\end{tabular} \\end{table}\nCheck the documentation of ?pack_rows from kableExtra to modify the group labels, add \\hlines, or other such cosmetic changes.\nSimulation How many people are needed such that there is at least a 70% chance that one of them is born on the last day of December? The question is, \u0026ldquo;How many people are needed such that there is at least a 70% chance that one of them is born on the last day of December?\u0026rdquo;. What they were finding now is \u0026ldquo;How many people are needed such that 70% have their birthdays on the last day of December?\u0026rdquo;. The answer to the second question is close to zero. But the first one is much simpler.\nReplacing prob \u0026lt;- length(which(birthday == 365)) / people with check = any(birthday == 365) in their logic because at least one of them has to be born on Dec 31 will work. Then, they will be able to find if that number of people will have at least one person born on Dec 31.\nAfter that, they will have to rerun the simulation multiple times to generate empirical probability distribution (kind of Monte Carlo). Only then they can check for probability.\nSimulation Code people_count = function(i) { set.seed(i) for (people in 1:10000) { birthday = sample(365, size = people, replace = TRUE) check = any(birthday == 365) if(check == TRUE) { pf = people break } } return(pf) } people_count() function returns the number of people required to have so that at least one of them was born on Dec 31. Then I rerun the simulation 10,000 times.\n# Number of simulations nsim = 10000 l = lapply(1:nsim, people_count) %\u0026gt;% unlist() Let\u0026rsquo;s see the distribution of the number of people required.\nhist(l, main = \u0026#34;Histogram of # People\u0026#34;, xlab = \u0026#34;# People\u0026#34;) To find actual probability, I\u0026rsquo;ll use cumsum().\ncdf = cumsum(l/nsim) which(cdf\u0026gt;0.7)[1] ## [1] 292 So, on average, you would need 292 people to have more than a 70% chance.\n","permalink":"/stackoverflow-answers/","summary":"A collection of my answers on Stackoverflow","title":"Stackoverflow Answers"},{"content":" IndiaPIN contains geographic details about 19,300 PIN codes in India. Some PIN codes had more than one offices. Only the first office of that PIN code area has been retained in those cases. (Updated: December 2021.)\nVariables Circle: (chr) Name of the Postal Circle Region: (chr) Name of the Postal Region Division: (chr) Name of the Postal Division Office: (chr) Name of Postal Office PIN: (int) Six-digit PIN Code District: (chr) Name of the District State: (chr) Name of the State Latitude: (dbl) Latitude Longitude: (dbl) Longitude Data Source Department of Posts, Ministry of Communications, Government of India. URL: https://www.indiapost.gov.in/vas/pages/findpincode.aspx. Wrangled for this package by Harshvardhan (https://harsh17.in/).\nInstallation # install `devtools` if not already installed if (!require(\u0026#34;IndiaPIN\u0026#34;)) devtools::install_github(\u0026#34;harshvardhaniimi/IndiaPIN\u0026#34;) ## Loading required package: IndiaPIN ## Warning in library(package, lib.loc = lib.loc, character.only = TRUE, ## logical.return = TRUE, : there is no package called 'IndiaPIN' ## Using github PAT from envvar GITHUB_PAT. Use `gitcreds::gitcreds_set()` and unset GITHUB_PAT in .Renviron (or elsewhere) if you want to use the more secure git credential store instead. ## Downloading GitHub repo harshvardhaniimi/IndiaPIN@HEAD ## cli (3.6.4 -\u0026gt; 3.6.5) [CRAN] ## utf8 (1.2.4 -\u0026gt; 1.2.5) [CRAN] ## Installing 2 packages: cli, utf8 ## ## The downloaded binary packages are in ## /var/folders/jw/3b0w1v0s3f990hs35ngj6jl40000gn/T//RtmpAZzBDV/downloaded_packages ## ── R CMD build ───────────────────────────────────────────────────────────────── ## * checking for file ‘/private/var/folders/jw/3b0w1v0s3f990hs35ngj6jl40000gn/T/RtmpAZzBDV/remotesd7c37e2d063/harshvardhaniimi-IndiaPIN-da43b49/DESCRIPTION’ ... OK ## * preparing ‘IndiaPIN’: ## * checking DESCRIPTION meta-information ... OK ## * checking for LF line-endings in source and make files and shell scripts ## * checking for empty or unneeded directories ## * building ‘IndiaPIN_0.0.1.tar.gz’ # Tidyverse library(tidyverse) ## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ── ## ✔ dplyr 1.1.4 ✔ readr 2.1.5 ## ✔ forcats 1.0.0 ✔ stringr 1.5.1 ## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1 ## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1 ## ✔ purrr 1.0.4 ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ## ✖ dplyr::filter() masks stats::filter() ## ✖ dplyr::lag() masks stats::lag() ## ℹ Use the conflicted package (\u0026lt;http://conflicted.r-lib.org/\u0026gt;) to force all conflicts to become errors # load IndiaPIN library(IndiaPIN) data(IndiaPIN) Example Data and Variables IndiaPIN ## # A tibble: 18,169 × 9 ## # Groups: PIN [18,169] ## Circle Region Division Office PIN District State Latitude Longitude ## \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;int\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;chr\u0026gt; \u0026lt;dbl\u0026gt; \u0026lt;dbl\u0026gt; ## 1 Andhra Prade… Kurno… Hindupu… Pedda… 515631 ANANTAP… ANDH… 14.6 77.9 ## 2 Andhra Prade… Kurno… Hindupu… Obula… 515581 ANANTAP… ANDH… 14.2 78.3 ## 3 Andhra Prade… Kurno… Hindupu… Gurra… 515571 ANANTAP… ANDH… 13.9 78.2 ## 4 Andhra Prade… Kurno… Hindupu… Halli… 515311 ANANTAP… ANDH… 13.8 77.0 ## 5 Andhra Prade… Kurno… Hindupu… Tamma… 515281 ANANTAP… ANDH… 14.1 77.0 ## 6 Andhra Prade… Kurno… Hindupu… Bussa… 515241 ANANTAP… ANDH… 14.0 77.7 ## 7 Andhra Prade… Vijay… Tadepal… Kavul… 534176 WEST GO… ANDH… 16.6 80.6 ## 8 Bihar Circle East … Bhagalp… Kathr… 813105 BANKA BIHAR 84.5 24.2 ## 9 Bihar Circle East … Bhagalp… Kasri… 813203 BHAGALP… BIHAR 87.3 25.3 ## 10 Bihar Circle East … Bhagalp… Akida… 853202 BHAGALP… BIHAR 25.4 84.3 ## # ℹ 18,159 more rows Number of PIN codes by State/UT IndiaPIN %\u0026gt;% group_by(State) %\u0026gt;% summarise(Count = n()) %\u0026gt;% arrange(desc(Count)) %\u0026gt;% print(n = 40) ## # A tibble: 35 × 2 ## State Count ## \u0026lt;chr\u0026gt; \u0026lt;int\u0026gt; ## 1 TAMIL NADU 2032 ## 2 UTTAR PRADESH 1581 ## 3 MAHARASHTRA 1466 ## 4 KERALA 1425 ## 5 KARNATAKA 1188 ## 6 WEST BENGAL 1125 ## 7 ANDHRA PRADESH 1071 ## 8 GUJARAT 1007 ## 9 RAJASTHAN 986 ## 10 ODISHA 933 ## 11 BIHAR 853 ## 12 MADHYA PRADESH 760 ## 13 ASSAM 571 ## 14 PUNJAB 531 ## 15 TELANGANA 482 ## 16 HIMACHAL PRADESH 436 ## 17 JHARKHAND 360 ## 18 HARYANA 310 ## 19 UTTARAKHAND 300 ## 20 CHHATTISGARH 240 ## 21 JAMMU AND KASHMIR 195 ## 22 DELHI 97 ## 23 GOA 88 ## 24 CHANDIGARH 25 ## 25 PUDUCHERRY 22 ## 26 SIKKIM 19 ## 27 MEGHALAYA 16 ## 28 TRIPURA 10 ## 29 MIZORAM 9 ## 30 THE DADRA AND NAGAR HAVELI AND DAMAN AND DIU 8 ## 31 LAKSHADWEEP 7 ## 32 ARUNACHAL PRADESH 5 ## 33 NAGALAND 5 ## 34 ANDAMAN AND NICOBAR ISLANDS 4 ## 35 LADAKH 2 PIN Code Locations on Map I will use leaflet package to plot randomly selected 50 PIN codes. I am adding the Region and Circle name in the popup.\nlibrary(leaflet) library(tidyverse) library(IndiaPIN) data(\u0026#34;IndiaPIN\u0026#34;) set.seed(4) index = sample(nrow(IndiaPIN), 50) data = IndiaPIN::IndiaPIN[index,] l1 = data$Longitude l2 = data$Latitude pop = paste(data$Region, data$Circle, sep = \u0026#34;, \u0026#34;) m = leaflet() %\u0026gt;% addTiles() %\u0026gt;% addMarkers(lng=l1, lat=l2, popup = pop) m Also see this Stackoverflow thread to understand how to save the plots.\nSee Github for source code.\n","permalink":"/indiapin/","summary":"R Package for All India PIN Codes Directory with Latitude and Longitude Details (Updated: December 2021)","title":"IndiaPIN: R Data Package"},{"content":"I worked on this project with Asar between April and June 2021 to create a Shiny app. Having a full-time job didn\u0026rsquo;t leave much time for side projects, and this project never reached fruition. This app has limitations \u0026mdash; primarily computational \u0026mdash; but can be helpful to researchers in finding urban and rural populations at the country level, state level, district level, or any of the 30 classes. The limitations are on computing power. My laptop, despite its prowess, not good enough. Probably we need a cluster or Google/AWS computational resource.\nHow to use it? Consider you want to find how many people in Nigeria live in a rural setting. You need to select Nigeria and set the level to 1. You will see a map and an option to download spreadsheets of urban and rural populations. What about the Abia state of Nigeria? Shift the divider value to 7, and you will have the urban-rural population of every state in Nigeria. It\u0026rsquo;s not complicated. It\u0026rsquo;s slow but not complicated.\nThe divider value represents the level of detail till which you need the data. For a Taluka (in India) or a block (in US), you would choose 25. For state level data you would choose 3.\nThe app works for 2020 data but extending to other years is trivial. I haven\u0026rsquo;t already done it because rasters quickly explode in sizes beyond the computational power of my laptop.\nThe Github repository contains codebase and related datasets/databases for Population Raster App.\nHow does this app work? The original raster to be aggregated is regional/country-level population, sourced from Worldpop (https://www.worldpop.org/geodata/listing?id=75).\nChoose the level of the urban-rural catchment.\n(Features below are not implemented yet)\nPartitioning raster could also be Urban-rural classification, Time-to-healthcare unit, etc.\nThe level of partitioning will be decided by the user in runtime.\nThe current app supports Urban-Rural classification for more than 200 countries and regions for 2020.\nLimitations of Current Version The app is as slow as snail. I can probably fix this with some degree of caching but working on sf data is generally slow. Additionally, the data from worldpop is surprisingly granular. That aggregation would require several levels of analysis. It only supports urban-rural classification for partitioning. Implementing other rasters for partitioning is not trivial. It only works for year 2020. Making it work for other years is trivial but probably not as a great idea because it will make the app even slower. Future Work High Priority Redesign the app with partitioning raster instead of country as focus. Instead of having the choices at the navigation bar, the users should see a single screen where they choose the country and the partitioning raster (possibly multiple partioning rasters).\nImproving runtime speed (AWS web hosting and better caching can be explored). Currently, the app downloads population and mappings of the country for the first time and reuses them when required.\nFor example, when you search the population for Latvia for the first time, it will download the relevant files and save them for future use. Next time someone uses the app again for Latvia, the processing will be faster as the files are already available offline.\nLow Priority Include support for years other than 2020. This is not a difficult thing to pull off but would require extensive computing resource, beyond what a laptop can provide. P.S. I do not work or engage with ASAR anymore. I had to drop this project due to other commitments. This app is useful despite its limitations.\n","permalink":"/shiny-urca/","summary":"R Shiny app that creates spreadsheet of population levels segregated at district, state, or any other level","title":"Urban-Rural Population in 200+ Countries"},{"content":"\nA North Carolina-based hygiene products company was facing a lot of returns, sometimes up to 15% of their sales. We were trying to find why. Using the data they provided for sales, transportation and claims we found opportunities to streamline distribution. We found that some companies ordered in smaller number of cases but more frequently, which could be causing returns. Ordering in frequent, small batches was also causing troubles at the warehouse. We were planning to devise an optimal ordering pattern for their customers, which could help the CPG company as well as their customers.\nThe consumer-packaged goods (CPG) company sells around 400 individual stock keeping units (SKUs) and ships 190,000 order-SKU combinations to 1000 customer locations each year. Customers can order in eaches (individual cases) or in layers, but full pallets are preferred due to the innate higher costs of handling partial pallets. There is a clear advantage is ordering in fewer pallets. It saves transportation cost to the company. Furthermore, this is also going to help the workers at the company.\nWe designed metrics for measuring orders measured in partial pallets, layers and eaches. Based on these metrics, we identified target rich opportunities for improvement. A large part of the project was spent in understanding the data and converting it to information.\nThis research project aimed to understand whether there is a benefit to changing customer order behavior, in terms of either frequency or amount ordered. Ellysa Groh and I worked under Prof Sean Willem\u0026rsquo;s guidance for this project.\n","permalink":"/optimizing-order-frequency-to-minimize-partial-pallet-orders/","summary":"How can a CPG company improve customers\u0026rsquo; ordering behaviour?","title":"Optimizing Order Frequency to Minimize Partial Pallet Orders"},{"content":" The Limitations of Artificial General Intelligence What is it that you see when you see? You see an object as a key, a man in a car as a passenger, some sheets of paper as a book. It is this word ‘as’ that must be mathematically formalised, on par with connectives “and”, “or”, “implies”, and “not”. …Until you do that, you will not get very far with your AI problem.\nBy Stanislaw Ulam, as cited in Bitwise: A Life in Code\nOr, it may be possible but just not yet.\nIt’s like reverse engineering the latest Intel processor with only the basic knowledge of how a transistor works.\nNick Sivo\nHow do we learn? Added on Sunday, July 17, 2022.\nYesterday, I met two guys from Micronesia. They spoke in broken English, and we could communicate little directly. I asked him how far Micronesia is from the US; he said he’d been here in the US for three years. How do you say “hi” in Micronesian? Kaselehlie. I had to Google this word; I didn’t remember it at all. Why? The word is so unlike everything I’ve heard.\nWe build knowledge based on our existing knowledge. Every new piece of information needs to be connected to a part of currently existing knowledge. In some ways, this is how our neural networks work.\nThis picture from Obsidian, a note-taking app designed to arrange every new note related to a previous message based on a tag, conveys the idea practically.\nOur current models either predict or classify. Either of these actions is not a complete representation of how human intelligence work. In fact, we are far away from understanding how human intelligence works!\nIf the parrots repeats the right answer, is it intelligent? Talking to your younger self Someone trained an AI on personal journal entries of their younger self. The younger self and present self had a conversation.\ntweetrmd::tweet_embed(\u0026#34;https://twitter.com/michellehuang42/status/1597005489413713921\u0026#34;) i trained an ai chatbot on my childhood journal entries - so that i could engage in real-time dialogue with my \u0026quot;inner child\u0026quot;\nsome reflections below:\n\u0026mdash; michelle huang (@michellehuang42) November 27, 2022 OpenAI’s ChatGPT is crazy good Here is a rap battle between Tidyverse and base R built on OpenAI’s Chat. It’s beyond what I had imagined.\ntweetrmd::tweet_embed(\u0026#34;https://twitter.com/tylermorganwall/status/1599238413580132354\u0026#34;) \u0026quot;Write me a poem battle between Tidyverse and Base R\u0026quot;\nAI solved the debate, folks! We never need to have this discussion again😀#RStats pic.twitter.com/jjIKQjGlU7\n\u0026mdash; Tyler Morgan-Wall @tylermorganwall@fosstodon.org (@tylermorganwall) December 4, 2022 Shakespear style! 👑\ntweetrmd::tweet_embed(\u0026#34;https://twitter.com/hadleywickham/status/1599254860385628161\u0026#34;) \u0026quot;write me a rap battle about base R versus tidyverse in the style of Shakespeare\u0026quot; pic.twitter.com/2v8Qp7caEU\n\u0026mdash; Hadley Wickham (@hadleywickham) December 4, 2022 R vs Python 😜\ntweetrmd::tweet_embed(\u0026#34;https://twitter.com/tylermorganwall/status/1599253837117739009\u0026#34;) It delivered on R vs Python too! And it\u0026#39;s an equally reasonable and even-handed take on the \u0026quot;debate\u0026quot;👌\nI for one welcome our new, nuanced AI overlords pic.twitter.com/wBUQzA8o7i\n\u0026mdash; Tyler Morgan-Wall @tylermorganwall@fosstodon.org (@tylermorganwall) December 4, 2022 It can work in Hindi and Chinese as well! Here is when I asked it to tell me the story of Buddha.\nSome other interesting chats with AI I asked it to write a story with soul in it Once upon a time, in a land far away, there lived a young girl named Rose. Rose was a kind i p and gentle soul, who loved nothing more than to spend her days wandering through the lush green forests and fields that surrounded her home. She was always happiest when she was surrounded by nature, and she felt a deep connection to the earth and all of its creatures.\nAs she grew older, Rose began to develop a special gift. She could sense the souls of the animals and plants around her, and she could communicate with them in a way that no one else could. She would spend hours talking to the birds and the bees, the rabbits and the squirrels, and she soon became known throughout the land as the “girl with the soul.”\nOne day, as Rose was walking through the forest, she came across a beautiful deer. The deer was injured, and Rose could sense that its soul was in danger. She quickly tended to the deer’s wounds, and as she did, she felt a strong connection to the animal. She knew that she had to do everything she could to save the deer’s life. For many days and nights, Rose stayed by the deer’s side, nursing it back to health. She sang to it, talked to it, and even shared her food with it. Slowly but surely, the deer began to recover, and its soul started to shine once again.\nOnce the deer was strong enough to stand on its own, Rose said goodbye to her new friend and watched as it ran off into the forest. She felt a sense of joy and fulfillment that she had never known before, and she knew that she had found her true calling in life.\nFrom that day forward, Rose dedicated herself to helping all of the creatures of the earth, and she became known throughout the land as a guardian of the soul. She spent her days traveling from place to place, bringing comfort and healing to all of the creatures she met.\nAnd as she did, she discovered that the more she helped others, the more her own soul flourished and grew.\nRose lived a long and happy life, surrounded by the love and gratitude of all of the creatures she had helped. And when she finally passed away, she left behind a legacy of kindness and compassion that would be remembered for generations to come.\nAnother Shakespear Poem ","permalink":"/ai/","summary":"What constitutes artificial general intelligence? Where are we lacking? What can we do to \u0026ldquo;get\u0026rdquo; it?","title":"Improvements in Artificial Intelligence"},{"content":"TL;DR My name is Harshvardhan. Since I do not have a last name, I have to force my name to suit the general convention of first and last name. Thus, I also go as Harshvardhan Harshvardhan (European convention), FNU Harshvardhan (American convention), or M Harshvardhan (my academic pen-name).\nHere is a conversation I had on Teams with one of my instructors when I wanted to clarify a few things.\nInstructor: Hi, what is your name? I need your name to go over your answers.\nMe: Harshvardhan. It would be registered in UTK system as FNU Harshvardhan or M Harshvardhan. I don\u0026rsquo;t have last name so every system works differently.\nInstructor: What do you mean you do not have last name?\nMe: My legal name is Harshvardhan. No last or middle name.\nInstructor: How?\nMe: In my family it\u0026rsquo;s common and in India its allowed. :sweat_smile:\nInstructor: Interesting!\nThis conversation was my highlight for the day. I had a hearty laugh how uncommon it was in the west not to have a last name.\nNames are precious and vital representations of personality. The two-three syllables call out a person who they are and if they are what they\u0026rsquo;re supposed to be. For humans, the general convention is to have multiple names \u0026mdash; first, middle and last \u0026mdash; and for animals, we seem to be adept with single names. For dogs, a Rottweiler Tommy from North America is Tommy, and so is a Doberman Tommy from India. They do not need family names as their human owners don\u0026rsquo;t feel obligated to lend their names.\nHowever, human names can still be weird. For example, Elon Musk called his child X Æ A-Xii, and CNN has a guide to assist you with pronunciation.\nWhat happens when you grow up in a family where last names are as fluid as the first name, and there is no concept of the middle name? Yes, I am from one of those families \u0026mdash; and I am not alone. Don\u0026rsquo;t get me wrong: even in India having only first name is uncommon. In fact, last names are critical for arranged marriages and many societal customs. My marriage profile would look dubious to most matchmakers.\nI was named Harsh at birth and Harshvardhan when I enrolled in school. No middle or last name. Others in my family had similar standing: my father\u0026rsquo;s last name is different from my grandfather and grandmother. My mother\u0026rsquo;s last name is different from my maternal grandfather or grandmother. My mother didn\u0026rsquo;t take my father\u0026rsquo;s name after marriage. Like me, my brother has no last name. My sister\u0026rsquo;s last name is different from my parents\u0026rsquo; and grandparents'.\nWhy?\nI had the same question when people told me about the family names.\nBy taking up a name, you signal your belonging to a particular group which may or may not work in your favour, depending on the circumstances. Wouldn\u0026rsquo;t it be better to have higher degrees of freedom and allow people to choose the last name as well?\nThe academic world is still oblivious to single names. Sheherazade and Ardiantiono (2020) wrote an excellent article on Nature: Attention science: some people have only one name.\nTo register for a scientific conference is an easy task for most, but not for us. Like many Indonesian people, we have a single name. Websites often do not allow us to proceed from one page to the next unless we fill out a \u0026lsquo;Last/Family name\u0026rsquo; box \u0026mdash; something we\u0026rsquo;re unable to do honestly. Instead, we populate these boxes, wherever they appear, with our first name again or some variant of \u0026lsquo;NA\u0026rsquo;.\nWord, Sheherazade and Ardiantiono. Word.\nThe short term fix is more of a nuisance than trouble. I have to force my name to suit the general convention of first and last name. Thus, I also go as Harshvardhan Harshvardhan (European tradition), FNU Harshvardhan (American way), or M Harshvardhan (my academic pen-name).\nIf you reached here and went through my ramble \u0026mdash; or really anything on my website \u0026mdash; you have gained the liberty to call me \u0026ldquo;Harsh\u0026rdquo;. Pronounced exactly as you imagine.\n","permalink":"/my-name/","summary":"I do not have a last name and it freaks people out.","title":"My Name"},{"content":"\nThe Tennessee Student/Teacher Achievement (STAR) Project was a large scale ran- domised experiment on class size conducted in Tennessee schools. The students and teachers were randomly assigned to one of the three class types: small class (13-17 stu- dents per teacher), regular-size classes (22-25 students) and regular classes with teacher\u0026rsquo;s aide (22-25 students). Over the four years, 11,600 students were part of the study from 80 schools. The randomisation (random assignment of teachers and students to one of these classrooms) happened at the school level.\nThis paper and STAR project aimed to discover the importance and impact of class size on learning outcomes. Colloquially speaking, it is hypothesised that smaller classes, i.e. classes with a low student-teacher ratio, have better learning outcomes. Krueger (1999)1 aimed to quantify the learning outcomes based on the class size using the Project STAR dataset. Stanford Achievement Test (SAT) and Tennessee Basic Skill First (BSF) tests were used as the proxy variables for student achievement or learning outcome.\nKrueger (1999) considers that class-size dummy variable and regular-size-with-aide dummy variables together can explain the effect on student achievement \u0026mdash; while con- trolling for other student-teacher related attributes (controlled covariates) and the school where the student is enrolled. The average percentile score with SAT measures the out- come of student achievement. Class size dummy and regular-class-with-aide are taken directly from the project database. The control variables are included as covariates such as gender, age, among others. School-related effects are included as a separate control variable in the study.\nFor more details, see the project report.\nAlan B. Krueger, Experimental Estimates of Education Production Functions, The Quarterly Journal of Economics, Volume 114, Issue 2, May 1999, Pages 497\u0026ndash;532, https://doi.org/10.1162/003355399556052\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/replicating-tennessee-star-project/","summary":"Econometrics Course Project","title":"Replicating Tennessee STAR Project"},{"content":"I upgraded my laptop to MacBook Air (2020) with the infamous M1 processor a few days ago. I went after the most powerful machine in the category, one with 16 gigabytes of RAM and one terabyte of storage \u0026mdash; I didn\u0026rsquo;t want to regret my choices anytime soon. M1 has been praised for its performance time and again by multiple different agencies. M1 Macs are probably the only products that Apple has promoted less than it should.\nM1 is smooth.\nSreyan Chaterjee, my friend who recommended me to upgrade Since I got my new machine principally for helping me with computations, I wanted to test how good it was with R compared to other devices.\nSomeone on Reddit had tried this with an old version of R built on Rosetta, but now that the native M1 version was out, I wanted to get my hands dirty and test it. RStudio doesn\u0026rsquo;t yet have a native app for M1 (yet), but the v1.4 allows to take full benefits of M1\u0026rsquo;s processing power as the components interacting with R are built natively.\nI wrote a small script performing singular value decomposition of a matrix, reconstructing it and noted the time it took for various machines. This experiment isn\u0026rsquo;t scientific by any means, but I hoped I could get my answers.\nlibrary(lhs) e = numeric() n = 500 k = 30 nsim = 500 t1 = Sys.time() for (i in 1:nsim) { set.seed(i) x = maximinLHS(n,k) s = svd(x) x.new = s$u %*% diag(s$d) %*% t(s$v) e[i] = sum((x - x.new)^2) } t2 = Sys.time() tt = t2 - t1 plot(e) abline(h = mean(e), col = \u0026#34;red\u0026#34;) cat(\u0026#34;Total time: \u0026#34;, tt) cat(\u0026#34;\\nMean error: \u0026#34;, mean(e)) The code is pretty straightforward. I construct 500 matrices of size 500x30. Each of them is randomly designed using Latin Hypercube Samples. Then, I decompose them using the singular value decomposition method and then reconstruct it likewise. Finally, I create a plot of error and time taken. The plot of errors look the same on all machines due to the use of same seed.\nTesting on MacBook Air (2020) This machine had 1 TB of SSD storage, 16 GB of RAM and Apple M1 processor. It took 5.6 minutes to run the entire code block.\nTesting on MacBook Pro (2019) #1 This machine had 1 TB of SSD storage, 16 GB of RAM and 2.6 GHz 6-core Intel Core i7. It took 9.2 minutes to run the entire code block.\nTesting on MacBook Pro (2019) #2 This machine had 1 TB of SSD storage , 16 GB of RAM and 2.3 GHz 8-core Intel Core i9. It took 9.7 minutes to run the entire code block.\nTesting on Microsoft Surface Pro (5th Gen) This machine had 512 GB of SSD storage, 16 GB of RAM and 1.9 GHz Intel Core i7. It took 12.4 minutes to run the entire code block.\nConclusion It was amazing to see how better was M1. Because all had the same memory (16 GB), I could technically say I was controlling for it to check how much would run time vary just by the processor and operating system change.\nNote that RStudio on M1 is still running with Rosetta and giving almost twice as good a performance. I am very excited about all I can do in the days to come.\nHoly moly 😂 ","permalink":"/how-fast-is-m1/","summary":"Comparing Apple M1 processor with Other Systems in R","title":"How fast is M1?"},{"content":"Next is my short and sweet newsletter about a curated collection of R-related works. It is posted at 9:30 AM (Eastern Time) every Wednesday. The content is pretty straightforward.\nFive stories. Four packages. Three jargons. Two tweets. One meme.\nWhy do I do this? To learn more about R and statistics, I follow many blogs and people on Twitter. I felt a strong urge to share all I remember with as many people as possible. Blogs had limited readership and were published intermittently. So to bring me to a routine, I started this newsletter. Every Wednesday, I put together some old exciting articles and new impressive innovations in an email sent to more than a hundred learners.\nThere was a solid latent demand: the letter gained a hundred learners before rolling out the first issue. Several positive reviews are a testimony that people like this.\nWhere can you read past editions? You can read past editions on Revue.\nWhere can you sign up? Right here!\n","permalink":"/next-today-i-learnt-about-r/","summary":"My Newsletter on R and Data Science","title":"Next — Today I Learnt About R"},{"content":"In this project, I applied Bayesian decision theory for classification problem. The datasets used were from Ripley\u0026rsquo;s Pattern Recognition and Neural Networks. The first dataset has two features and has a balanced class portfolio. The second dataset is for diabetes in Pima Indians with seven features where the number of diabetic patients is much less than the number of normal patients.\nCodes to perform these calculations are in this Github repository.\nThe heart of this project was in three functions.\nEuclidean Classifier Mahalanobis Classifier Bayesian Quadratic Classifier Euclidean Classifier The features are assumed to be statistically independent of each other (strictly speaking, no correlation) and have the same variance.\nGeometrically, the samples would fall in a equal-radii hyperspherical cluster.\nThe decision boundary for a two class problem would be a hyperplane \\(d\\)-dimensions.\n$$ \\Sigma = \\begin{bmatrix} \\sigma^2 \\dots 0 \\ \\vdots \\ddots \\vdots \\ 0 \\dots \\sigma^2 \\end{bmatrix}. $$\n$$ g_i(\\vec{x}) = - \\frac{||\\vec{x} - \\vec{\\mu_i}||}{2\\sigma^2} + \\ln{P(\\omega_i)}. $$\nPython Function def euclid_classifier(xtrain, ytrain, xtest, ytest, pw): t1 = t.time() pw0 = pw pw1 = 1-pw nn, nf = xtest.shape # for class 0 arr = xtrain[ytrain == 0] covs0 = np.cov(np.transpose(arr)) means0 = np.mean(arr, axis = 0) # for class 1 arr = xtrain[ytrain == 1] covs1 = np.cov(np.transpose(arr)) means1 = np.mean(arr, axis = 0) # for euclidean distance covavg = (covs0+covs1)/2 avg_var = np.mean(np.diagonal(covavg)) # initialising yhat array yhat = np.ones(len(ytest)) for i in range(len(ytest)): #for class 0 d = np.dot(xtest[i]-means0, xtest[i]-means0) g0 = -d/(2*avg_var) + np.log(pw0) # for class 1 d = np.dot(xtest[i]-means1, xtest[i]-means1) g1 = -d/(2*avg_var) + np.log(pw1) # if g0\u0026gt;g1, then i belongs to 0, else to 1 if(g0\u0026gt;g1): yhat[i] = 0 overall_acc = np.sum(yhat == ytest)/len(ytest) class0_acc = np.sum(yhat[ytest == 0] == 0)/np.sum(ytest == 0) class1_acc = np.sum(yhat[ytest == 1] == 1)/np.sum(ytest == 1) t2 = t.time() tt = t2-t1 return yhat, overall_acc, class0_acc, class1_acc, tt Mahalanobis Classifier The covariance matrices for all classes are identical but not identity (times \\(\\sigma^2\\)). There is a constant variance.\nGeometrically, the samples fall in a hyperellipsoidal shape.\nDecision boundary is a hyperplane of \\(d\\)-dimensions.\n$$ g_i(\\vec{x}) = - \\frac{1}{2}(\\vec{x} - \\vec{\\mu_i})\u0026rsquo;\\Sigma_i (\\vec{x} - \\vec{\\mu_i}) + \\ln{P(\\omega_i)}, $$\nwhere \\(\\Sigma_i = \\Sigma\\).\nPython Function def maha_classifier(xtrain, ytrain, xtest, ytest, pw): t1 = t.time() pw0 = pw pw1 = 1-pw nn, nf = xtest.shape # for class 0 arr = xtrain[ytrain == 0] covs0 = np.cov(np.transpose(arr)) means0 = np.mean(arr, axis = 0) # for class 1 arr = xtrain[ytrain == 1] covs1 = np.cov(np.transpose(arr)) means1 = np.mean(arr, axis = 0) # for Mahalanobis distance, avg of the two covariance matrix is chosen covavg = (covs0+covs1)/2 # initialising yhat array yhat = np.ones(len(ytest)) for i in range(len(ytest)): #for class 0 d = np.matmul(np.matmul(xtest[i]-means0, np.linalg.inv(covavg)), xtest[i]-means0) g0 = -d + np.log(pw0) # for class 1 d = np.matmul(np.matmul(xtest[i]-means1, np.linalg.inv(covavg)), xtest[i]-means1) g1 = -d + np.log(pw1) # if g0\u0026gt;g1, then i belongs to 0, else to 1 if(g0\u0026gt;g1): yhat[i] = 0 overall_acc = np.sum(yhat == ytest)/len(ytest) class0_acc = np.sum(yhat[ytest == 0] == 0)/np.sum(ytest == 0) class1_acc = np.sum(yhat[ytest == 1] == 1)/np.sum(ytest == 1) t2 = t.time() tt = t2-t1 return yhat, overall_acc, class0_acc, class1_acc, tt Bayesian (Quadratic) Classifier The covariance matrix is different for different categories.\nIt is a quadratic classifier.\n$$ g_i(\\vec{x}) = -\\frac{1}{2} (\\vec{x} - \\vec{\\mu_i})\u0026rsquo;\\Sigma_i (\\vec{x} - \\vec{\\mu_i}) - \\frac{1}{2} \\ln{|\\Sigma_i|} + \\ln{P(\\omega_i)}. $$\nPython Function def bayes_classifier(xtrain, ytrain, xtest, ytest, pw): t1 = t.time() pw0 = pw pw1 = 1-pw nn, nf = xtest.shape # for class 0 arr = xtrain[ytrain == 0] covs0 = np.cov(np.transpose(arr)) means0 = np.mean(arr, axis = 0) # for class 1 arr = xtrain[ytrain == 1] covs1 = np.cov(np.transpose(arr)) means1 = np.mean(arr, axis = 0) # initialising yhat array yhat = np.ones(len(ytest)) for i in range(len(ytest)): d = np.matmul(np.matmul(xtest[i]-means0, np.linalg.inv(covs0)), xtest[i]-means0) * -0.5 g0 = -0.5*np.log(np.linalg.det(covs0)) + d + np.log(pw0) d = np.matmul(np.matmul(xtest[i]-means1, np.linalg.inv(covs1)), xtest[i]-means1) * -0.5 g1 = -0.5*np.log(np.linalg.det(covs1)) + d + np.log(pw1) # if g0\u0026gt;g1, then i belongs to 0, else to 1 if(g0\u0026gt;g1): yhat[i] = 0 overall_acc = np.sum(yhat == ytest)/len(ytest) class0_acc = np.sum(yhat[ytest == 0] == 0)/np.sum(ytest == 0) class1_acc = np.sum(yhat[ytest == 1] == 1)/np.sum(ytest == 1) t2 = t.time() tt = t2-t1 return yhat, overall_acc, class0_acc, class1_acc, tt ","permalink":"/supervised-learning-using-baysian-decision-rule/","summary":"Python Functions for Bayesian Learning (COSC 522 Project)","title":"Supervised Learning Using Baysian Decision Rule"},{"content":"\nI recently moved to United States (USA or America as everyone calls it) for my PhD at the Haslam College of Business, University of Tennessee. Travelling during the pandemic is a major safety issue. Though vaccinations are common these days and I am fully vaccinated, I can\u0026rsquo;t assure the same for all my fellow passengers. Airlines require and provide masks and sanitisers during travel but we all know its not enough. With these little risks, I left my home on July 20, 2021 to start the new phase of my life.\nUS surprised me with the ease they had with COVID-19. Maybe the high vaccination rates or too much exposure to COVID-19 was the reason they were so confident. I was distancing away from crowds but the Americans didn\u0026rsquo;t have that reluctance.\nIn the next few paragraphs, I will write about a few uncanny things I learnt about USA. Some are cultural shocks, others are technological shocks.\nMasks and Vaccination It was probably a smart decision by CDC to stop the mask mandate as its enforceablity is seriously dubious if the general population doesn\u0026rsquo;t trust its effective. The vaccine hesitancy really caught my eye. Unlike India where vaccine supply was the limiting factor on how many people were vaccinated, Americans chose not to get vaccinated despite easily available vaccines.\nI had some tangential idea of it as India actually has one of the lowest vaccination hesitancy rates, but seeing it unfolding in front of my eyes was surprising. Most Indians think vaccines are important, safe and effective. However, Americans tend to doubt the importance, safety and effectiveness of vaccines. Source.\nThankfully, with the fear of the Delta variant the vaccination rates are picking up. (On a sidenote, I still wonder if they should increase the gap between the dosages (for Pfizer or Moderna vaccines) to make them more effective.)\nPublic Transport Public transport is not a popular option for travelling within United States. Transport systems do exist in major cities but their frequency of operation is surprisingly low. For traveling between states, you either have the option of ill-timed bus services like Greyhound or claustrophobic flights. Amtrack railways are only present in around thirty cities.\nMy experience with interstate buses wasn\u0026rsquo;t positive. I booked a Greyhound bus from Knoxville, TN to Columbus, OH and the bus was seven hours late! The operator on duty shared it was due to acute shortage of drivers. More on employment. I was satisfied with KAT buses (Knoxville local transport), though their frequency of once in an hour could certainly be improved.\nCars are the primary mode of transport here. Most residents own a car and use them as their primary transport \u0026mdash; no matter the distance to cover. In India, we use cars for traveling within cities or between two cities if the timings of railways or buses aren\u0026rsquo;t as per our convenience. Since public transport are an inferior alternative in US, cars are the end-all-be-all medium of travel.\nAbsence of public transport is enraging in certain instances. For my new apartment, I needed to buy a some furniture but couldn\u0026rsquo;t figure how to transport them from the store to my home. Everyone expected me to have a truck!? When I enquired how people move their furniture if they do not have trucks, they suggested me to check out U-Haul. U-Haul rented me trucks but I had to drive them myself. How would I know how to drive a truck!?\nPayments Almost everyone here has a credit card and that\u0026rsquo;s the backbone of payments. However, if you are new in the system like I was and do not have social security set up yet, you cannot get a credit card. Further, credit card payments are not instantaneous and free. I had a habit of scanning the ubiquitious UPI QR codes for payments and I expected the payment system to be free, fast and safe.\nRecently Venmo and Cashapp became popular which are like Paytm wallet that we had in India before 2014. However, you are still linked to one app only and cross-app transfers are not possible. Their best payment system is ten years behind the current payment system of India. Google had recommended Federal government to implement the Indian UPI-like system in US. I hope they launch something like it soon.\nFood Servings in US are way more than a typical meal. Unfortunately, there are no half-servings either. For $6.99, we get a plate full of food that you can eat once, pack, reheat and eat again, pack, reheat and eat again.\nLater, I learnt that the servings are huge because many cannot afford to have three square meals and therefore aim to complete their daily calorie requirements with a single meal. Serving half of the food for half the price, and consuming them twice in a day would be a healthier alternative.\nIt is very easy to spot unhealthy food. McDonalds and Burger King are affordable and easy to find. However, fresh fruits, vegetables, salads, meat, etc. is much more expensive and elusive. How come fresh uncooked chicken is more expensive than a chicken burger!?\nWastages Water: The shower knobs in bathrooms cannot control the intensity of water flow; only the temperature. Guess the amount of water that\u0026rsquo;s simply wasted during temperature adjustment or even during shower. (Sometime in November 2021 my roommate Jack told me I could control the temperature actually. The knob that controlled temperature also controlled water flow. Bad design, but at least that\u0026rsquo;s something.)\nElectricity: The electrical plugs do not have a switch for the socket. If its plugged in, its switched on. Imagine the electricity wasted because you didn\u0026rsquo;t unplug your rice cooker or oven. In fact, certain equipments are never switched off at all!\nPlastics: In India, I was charged for every single plastic bag to deter me from using plastics and using my own bags. In US, I brought five items and the shopkeeper gave me six bags \u0026mdash; for free.\nEmployment Economic impact payment may have some effect on it, but there are so many places looking for people to hire. McDonalds, Walmart, Aldi, or literally any place that requires humans for service is currently hiring. All this, while the unemployment is so high!\nIdentity In India, we had Aadhaar that is used for all identification. We can get it easily, it is linked to all biometric prints and its basically a QR code and a number. Even that number can be masked with temporary numbers!\nIn comparison, US has something called Social Security number that people put in their best to hide. If the number is leaked, there could be dire consequences. Why can\u0026rsquo;t they be masked? For identification required during travel, if you do not have a driving licence, you have to rely on your passport for identification. I do not think carrying passports while travelling everytime is optimal.\nCarrying these identification\u0026rsquo;s physical copy is risky and unsafe. You can lose them or end up exposing them to unintended audiences. In India, I could have a single Digilocker account which carried all my papers and documents that were accessible for more than identification. I wish US launches some similar digital identification for all.\nThese shocks make me appreciate how far have we (India) come today. Internet certaily has been the driving force. I have deeply believed India is the future and would be the place where the world would look up to in the next decade. All these only make my conviction stronger.\n","permalink":"/ten-days-in-the-united-states/","summary":"Cultural and technological shocks","title":"Ten days in the United States"},{"content":"Every writing is meant to be read \u0026mdash; at least by the writer. As the writings and writer gain popularity, the readership expands. Most of my blogs are read only by me. Some blog posts like COVID-19 data testing and Spotify visualisation did gain traction. But by and large, I am my website\u0026rsquo;s audience.\nLooking back at old writings means finding a lot of errors, mostly grammatical errors. Though I resist the temptation to edit the already published articles, I turn pedantic and correct my mistakes.\nAcademic Writing Such behaviour to correct mistakes after publishing is not appreciated in academia. Erratum is typically used only for significant errors like calculation errors or misleading phrasings \u0026mdash; not for trivial grammatical errors.\nBecause the academic publication cycle involves multiple editing and review rounds, the likelihood of such error is marginal. For a case study, reviewers and I exchanged emails and drafts seven times even though the case study was accepted only after the first review. Comments from the reviewers have benefited me tremendously beyond that case study.\nPaul Silvia\u0026rsquo;s book \u0026ldquo;How to Write a Lot?\u0026rdquo; addressed many concerns around writing. An important suggestion was to allocate time to writing activities and not only writing down.\nWriting activities include reading and writing reviews, editing drafts, bureaucratic processes like submitting manuscripts, etc., and writing new scripts.\nWhen these writing activities are scheduled regularly, they become a habit. My brain usually recreates the sentence conveying the same idea when I read the original sentence. Often the revised article has better structure and organisation.\nYihui Xie wrote long ago about the nuances of scientific writing and brought out some exciting ideas on writing. Another much more engaging article is on How to Write Consistently Boring Literature.\nWriting for Fun I started blogging in 2017. Initially, it was hosted as a WordPress website with the domain (harshvadhan.xyz) from GoDaddy. The domain was deeply discounted for the first year (Rs 300 or $4) but got expensive next year (Rs 3500 or $50). I didn\u0026rsquo;t renew my domain and moved on to Google Sites, which was free and surprisingly simple to get started.\nLast year I moved to Google Domains that have fixed yearly bills. But Google Sites had many limitations. Sites have many inconsistencies. I could choose a font for the site, but it didn\u0026rsquo;t apply to all pages. Pasting from a text editor allowed line spacing, but there was no such option in Sites editor. Eventually, every page looked different.\nHosting raw files was surprisingly involved. I had to upload them to Google Drive, then create a public sharing link, and it still required the viewer to log in to their Google account. If the user was a G-Suite user, their admin had to approve Google sites; otherwise, viewing this site required logging out of their G-suite accounts. IIM Indore didn\u0026rsquo;t allow Google Sites initially, but surprisingly IT Department changed this once I emailed them about it.\nGoogle search didn\u0026rsquo;t index all my blog posts. I tried manually injecting some links in Google Console, but it was a prolonged process as Google took around a week to review each link submitted manually. I also didn\u0026rsquo;t have access to the robots.txt file that could direct crawlers to each page.\nSometime last year, I heard about Blogdown. I tried it twice and had failed both times \u0026mdash; using so many systems together was confusing. Two months ago, Alison Hill\u0026rsquo;s fantastic talk \u0026ldquo;Introduce yourself online using blogdown and Hugo Apèro\u0026rdquo; came up in my YouTube recommendations, and I followed the screencast step-by-step. I was up and running with Blogdown!\nI ported all but a few blog posts from my old website to the new for the content. The new site looked different and was much faster than Google Sites. Another good part was that indexing on search engines was much easier as I had manual control over the robots.txtfile.\nWith this current system, I have my casual writing and publishing part figured out. I am still pondering on how to proceed with scientific writing.\n","permalink":"/technical-and-casual-writing/","summary":"My experience of technical and casual writing.","title":"Technical and Casual Writing"},{"content":"\nAt Aspect Ratio, I developed a pipeline for predictive modelling of Merck\u0026rsquo;s HPV vaccine. I had to find the crucial variables in the analysis. There are many methods to find them: identify significant variables in simple linear regression or use feature selection methods such as Principal Components Analysis and Fisher\u0026rsquo;s Linear Discriminant. The goal was to determine what factors drove vaccine coverage in US counties and states and where to promote it.\nWe didn\u0026rsquo;t implement the project during my short tenure. The vaccine GARDASIL 9 is now available for sale across the US.\nDuring the time, I realised we made a lot of calls to the central insurance database of patients provided by Merck and other insurance companies. Most of them used SQL for this; however some were fans of dbplyr and Python connectors. The database was massive, and each programmer wrote their functions to collect information. In the long term, it resulted in inconsistencies in code and thus results.\nI created an R Shiny app to write standardised SQL codes to solve this. It took user inputs on what information was needed and presented SQL queries in text form that could be copied and pasted into their favourite SQL tool.\nAspect Ratio and Merck were terrific companies to work with. My manager, Sneha Apte, was considerate of project timelines and a strong proponent of independence in work. I worked with R almost entirely (Merck is a donor in the R-Project). Sneha and many others in our team worked with Python. However, that was never a problem. The team and our analysis were truly platform-agnostic.\nWe also had little exchange sessions on new cool tools and ideas every month. A team member who had created something unique, a tutorial about a topic in R or Python, or sometimes just discussions on making slides or presenting. Overall, a wonderful experience.\n","permalink":"/aspect-ratio/","summary":"My experience of working as an analyst for Merck Inc. at Aspect Ratio, Pune","title":"Aspect Ratio (Merck Inc.)"},{"content":"I was reading the book \u0026ldquo;How Not to Be Wrong: The Power of Mathematical Thinking\u0026rdquo; by Jordan Ellenberg. The book introduces a paradox named after Daniel Ellsberg, a young analyst at RAND Corporation and famous for leaking the Pentagon papers to civilians. Von Neumann and Morgenstern had proven that all individuals acted based on certain rules so as to maximize their utilities (The Theory of Games and Economic Behaviour, 1944). While working at RAND on humans taking decisions under the face of uncertainty, he devised a famous experiment, now known as Ellsbeg\u0026rsquo;s Paradox.\nSuppose there is an urn (or a bag) with ninety balls inside. You know that that thirty of the balls are red; concerning the other sixty balls, you know only that some are black and some are yellow. The experimenter describes to you the following bets:\nRed: You get $100 if the next ball pulled from the urn is red, else you get nothing. Black: You get $100 if the next ball is black, otherwise nothing. Not-red: You get $100 if the next ball is either black or yellow, otherwise nothing. Not-black: You get $100 if the next ball is either red or yellow, otherwise nothing. Which bet do you prefer; Red or Black? What about Not-red vs Not-black?\nHow Not to Be Wrong: The Power of Mathematical Thinking, Jordan Ellenberg\nSimulation I want to run a simulation study of this experiment. We know that some are black and some are yellow. I want to vary the number of black and yellow balls in the urn to check the most sensible bet at different combinations.\n# total number of balls n_total = 90 # red balls n_red = 30 # possible number of red balls np_black = c(1:(n_total - n_red)) np_yellow = n_total - n_red - np_black Now, think about this \u0026mdash; the number of red balls are fixed (30). The number of black and yellow balls can vary between one to 59, with their sum fixed at 60. Therefore, for each number of black (or yellow) ball, we have distinct probabilities for all four bets. Let us calculate them.\n# creating vectors to store probabilities of black and yellow colours # (probability of red will always remain 1/3) p_black = np_black/n_total p_yellow = np_yellow/n_total Let us visualize these results.\nplot(p_black, type = \u0026#34;l\u0026#34;, lwd = 3, col = \u0026#34;black\u0026#34;, xlab = \u0026#34;Number of Black Balls\u0026#34;, ylab = \u0026#34;Probability of Bets\u0026#34;); abline(h = 1/3, lwd = 3, col = \u0026#34;red\u0026#34;); lines(p_yellow, type = \u0026#34;l\u0026#34;, lwd = 3, col = \u0026#34;yellow\u0026#34;) Depending on how many black balls are in the urn, you have different probabilities of winning \u0026mdash; nothing unexpected. It is apparent that one should choose Yellow when the number of black balls is less than thirty and Black when the number of black balls is more than thirty. The only case when one could choose Red is when all the balls are in equal number in the urn, and they should be indifferent between all three at that point.\nCatching Up However, our original experiment wasn\u0026rsquo;t about choosing any of these individual colors, it was choosing between the four bets: Red, Not-red, Black and Not-black. Which of these is the better option? Let\u0026rsquo;s find out!\nThe probability of Red remains fixed at one-third, no matter what is the combination of black and yellow. Therefore, the probability of Not-red also remains fixed at two-third. The probability of Black varies between 1/60 to 59/60. Therefore, the probability of Not-black varies from 59/60 to 1/60. Let\u0026rsquo;s visualize all of these!\nBets Not-red and Not-black are represented by blue and gray respectively.\nplot(p_black, type = \u0026#34;l\u0026#34;, lwd = 3, col = \u0026#34;Black\u0026#34;, xlab = \u0026#34;Number of Black Balls\u0026#34;, ylab = \u0026#34;Probability of Bets\u0026#34;, ylim = c(0,1)); lines(1-p_black, lwd = 3, col = \u0026#34;Gray\u0026#34;); abline(h = 1/3, lwd = 3, col = \u0026#34;Red\u0026#34;); abline(h = 2/3, lwd = 3, col = \u0026#34;Blue\u0026#34;) Clearly, Not-red dominates Red and Black. Not-black dominates Red. So, if presented a choice, I would bet on Not-black if the number of black balls is less than 30 and Not-red if the number of black balls is more than 30. Since the number of black balls in not known, I will compare the expected value \u0026mdash; which is exactly the same for both cases. (A simple way to check this is compare the area under the curve for blue and gray lines. In this case, they\u0026rsquo;re both 40.)\nThe Paradox However, Ellsberg found that almost everyone preferred the Not-red bet over Not-black even when game theory and statistics showed that the two choices should be equally-preferred. This phenomenon was termed \u0026ldquo;uncertainty aversion\u0026rdquo;.\nWhen individuals are presented with choices that are equally profitable, they choose the one that has lower uncertainty. This uncertainty is different from risk, which is usually measured as standard deviation in statistics. Risks are known unknowns and uncertainties are unknown unknowns (Donald Rumsfeld). When presented with the latter, we always choose the option that has lower uncertainty.\nThis experiment and paradox may sound trivial today but when it was first presented to the world it was a breakthrough. Utility maximization theory by Von Neumann and Morgenstern was seriously challenged by the results. Utility theory, hitherto unchallenged, had met its first limitation. Today, these limitations are well accepted among economists.\n","permalink":"/ellsberg-paradox/","summary":"Simulation of Ellsberg\u0026rsquo;s Paradox","title":"Ellsberg's Paradox"},{"content":"I realised IRCTC\u0026rsquo;s captcha system is a brilliant move. Why? Well, you can\u0026rsquo;t do away with captcha systems. They are everywhere identifying robots from humans. At the same time, people don\u0026rsquo;t want to see ads, and everyone uses ad-blockers specifically for that purpose. Why not integrate something indispensable with something disliked?\nThat\u0026rsquo;s what IRCTC did.\nYou will have to read the ad to know the captcha code. By asking you to type \u0026ldquo;Jeevan shanti\u0026rdquo; as code, it naturally anchors you to choose this LIC insurance product over others when you buy insurance products next time.\nThe system is provided by Simpli5d, which also boasts of customers like Axis Bank, Tata, Visa, Volkswagen, and many more.\n","permalink":"/irctc-captcha-is-smart/","summary":"Why not integrate captcha and ads?","title":"IRCTC's Captcha is Smart"},{"content":"This year, I am participating in useR 2021, the R conference. It is my first conference so I expect to miss out on a few things but still hope to learn as much as I can. I am relatively free for the next week so learning as much as possible from the conference will be my central focus.\nSadly, I didn\u0026rsquo;t realise I had to register for tutorials separately — I thought one registration was all that\u0026rsquo;s needed. I cannot attend the tutorials but it is likely that the authors would upload materials on their personal website, where I can pick up when needed. The complete session plan is here.\nYihui Xie: Blogging and Writing Books Yihui met John Kimmel who inspired him to write his first book. During his PhD, he wrote the knitr package. Writing the complete documentation would take a lot of time so he decided to write a book instead on the package. The same pattern follows till today. A package is written; alongside some limited documentation a detailed book is written.\nThe content of the first book (on knitr) included package documentation, Stackoverflow and mailing list questions, and some internal workings of the package.\nYihui liked Chapman and Hall (editors and publishers) so much that he till today publishes book with them. In fact, he recommends it to most bookdown users.\nA critical problem with using books as alternative to documentation is the inability to update. Software and packages are subject to constant updates and improvements that cannot be pushed to books so easily. If it is an online Bookdown book, it is relatively simple. Blogdown was subject to multiple updates because the underlying Hugo updates. Such changes can be reflected with some difficulty on the Blogdown book but that\u0026rsquo;s impossible with printed books.\nSolution? Apart from a disclaimer in the printed book to check the current version, I cannot think of anything more. I suspect hardly anyone buys the printed copies of the books as the free version is available online.\nIn the presentation, I got the idea of changing the permalinks on my website to year/month/slug format instead of the current /slug format. After sleeping over it, I decided against it because I frequently update my old blogs and sometimes also change publishing dates (some may call it wrong practice). Therefore, for consistency and unbroken URLs, I will continue with /slug.\nYihui mentioned about the talk by David Robinson on \u0026ldquo;The unreasonable effectiveness of public work\u0026rdquo;, available on the Rstudio website. I will watch it soon.\nHe had some tips to share on blogging:\nOnly after publishing more than 30 posts should you really consider themes. As a beginner, do not spend a lot of time on this. I use Hugo Apero for this website, and thanks to Yihui and Alison Hill, I do not have to worry a lot about my themes. Enable comments so that there is a dialouge between the author and the readers. My experience with comments hasn\u0026rsquo;t bee great. Around three years ago, I used to host my site on Wordpress and had comments enabled. The comment fields were filled with spam, even inappropriate spam. Further, I have found putting my email directly has provided me better engagement. (Once I received comments via email from halfway across the world and it made my day!) Probably pagination in the list of blogs is not a good option. Yihui says he likes to see what all have you written at a glance and clicking next after every five articles breaks flow. I have increased my articles on a page to 20, which I believe is a good balance between currency and flow. I am definitely going to be more casual about my blogs from now on.\nSlides\nUpdate: January 27, 2022 Of course I watched David Robinson\u0026rsquo;s talk and studied career paths of Alison Hill and Yihui Xie (whose ideas I also borrowed for my talk). Another tweet noteworthy on this topic is the following.\n\u0026quot;Things that are still on your computer are approximately useless.\u0026quot; -@drob #eUSR #eUSR2017 pic.twitter.com/nS3IBiRHBn\n\u0026mdash; Amelia McNamara (@AmeliaMN) November 3, 2017 ","permalink":"/yihui-on-blogging-and-writing-books/","summary":"Attending My First Academic Conference","title":"Yihui on Blogging and Writing Books"},{"content":"Invento Robotics by Balaji Viswanathan is probably one of the most famous start-ups in the Indian robotics space. Their flagship robot Mitra was used at a high profile Global Entrepreneurship Summit in October 2017. This marketing case study is on designing their marketing plan. This research work was selected into The Case Center 2021 Competition in the Hot Topics category. Colloquially, this competition is known as the Oscars of case studies.\nAbstract Based in Bengaluru, India, Invento Robotics (Invento) was a start-up that manufactured humanoid robots. In October 2017, Invento received a mandate to develop a humanoid robot to welcome India\u0026rsquo;s prime minister and the US president\u0026rsquo;s senior adviser at the Global Entrepreneurship Summit that year in Hyderabad. It was a high-stakes event for Invento; a positive reception to the robots would mean many leads for the company, while a negative response could mar Invento\u0026rsquo;s reputation. While Invento\u0026rsquo;s time had hitherto been spent on research and development, there was pressure for it to now determine an overall marketing strategy. It was imperative that Invento quickly crystallize ideas about its customers, product positioning, branding, pricing, distribution channels, and promotions.\nCitation Harshvardhan, M. and Kumar, B. (2021). \u0026ldquo;Invento Robotics: Launching Humanoid Robots\u0026rdquo;. Case Study and Teaching Note. Ivey Publishing. [The Case Centre] Links The Case Centre: https://www.thecasecentre.org/products/view?id=178314 Blog: https://www.harsh17.in/invento-robotics/ ","permalink":"/invento-robotics-launching-humanoid-robots/","summary":"Invento Robotics by Balaji Viswanathan is probably one of the most famous start-ups in the Indian robotics space. Their flagship robot Mitra was used at a high profile Global Entrepreneurship Summit in October 2017. This marketing case study is on designing their marketing plan. This research work was selected into The Case Center 2021 Competition in the Hot Topics category. Colloquially, this competition is known as the Oscars of case studies.","title":"Invento Robotics: Launching Humanoid Robots"},{"content":"There isn\u0026rsquo;t one quote that I love. Euphemisms and quotes keep me lasting. As Zig Ziglar said, \u0026ldquo;People often say that motivation doesn\u0026rsquo;t last. Well, neither does bathing - that\u0026rsquo;s why we recommend it daily.\u0026rdquo;\nI would like to have a collection but keep procrastinating. Starting today, I\u0026rsquo;ll add new quotes, as \u0026ldquo;the best time to plant a tree was twenty years ago, the second-best time is now.\u0026rdquo;\nI also made a website using R-Shiny that scraps a random Wikiquote and puts it in a nice background. Check it out!.\nIn no particular order, they are:\nAlmost everything will work again if you unplug it for a few minutes, including you. \u0026mdash; Anne Lamott\nIf you don't bring it with you, you won't find it there. \u0026mdash; Poem on pilgrimage\nCoincidence and intention are two sides of a tapestry, my lord. You may find one more agreeable to look at, but you cannot say one is true and the other is false. \u0026mdash; Ted Chiang, Exhalation\nCherish your visions; cherish your ideals; cherish the music that stirs in your heart, the beauty that forms in your mind, the loveliness that drapes your purest thoughts, for out of them will grow all delightful conditions, all, heavenly environment; of these, if you but remain true to them, your world will at last be built. \u0026mdash; James Allen, As a Man Thinketh\nEvery man is where he is by the law of his being; the thoughts which he has built into his character have brought him there, and in the arrangement of his life there is no element of chance, but all is the result of a law which cannot err. \u0026mdash; James Allen, As a Man Thinketh\nThe world will ask you who you are, and if you don\u0026rsquo;t know, the world will tell you. \u0026mdash; Carl Jung\nI used to be a scientist. I warned the town about upcoming volcano. And then it became all about weed. \u0026mdash; Randy, South Park\nThe true mind can wither all the lies and illusions without being lost. The true heart can touch the poison of hatred without being harmed. \u0026mdash; Lion Turtle from Avatar: The Last Airbender\nIt seems, in fact, as though the second half of a person\u0026rsquo;s life is made up of nothing but the habits they accumulated during the first half. \u0026mdash; Fyodor Dostoyevsky\nAnything that prevents people from doing great work has an inverse that helps them to. \u0026mdash; Paul Graham\nThe future is already here. It\u0026rsquo;s just not evenly distributed yet. \u0026mdash; William Gibson\nIf I had to put the recipe for genius into one sentence, that might be it: to have a disinterested obsession with something that matters. - Paul Graham\nWhat are the most important problems in your field, and why aren\u0026rsquo;t you working on one of them? - Hamming\nDon\u0026rsquo;t use dissatisfaction as an excuse to be lazy. - Paul Graham\nSometimes it\u0026rsquo;s not what you do what you not do would make you successful. - Paul Graham\nContent is important. Medium is not important.\nI walk with the assurance of a sleep walker towards my destiny - Adolf Hitler\nI am like a snake, I slough my skin and start afresh. - Goethe\nIt\u0026rsquo;s not the plane, it\u0026rsquo;s the pilot. - Topgun\nThe oracle said to you exactly what you needed to hear. - The Matrix\nHigh Quality Work Produced = (Time Spent) x (Intensity of Focus)\nFriday afternoon. Spend 10% time to stop and think: \u0026ldquo;what are the important problems to solve? What are the fundamental structures?\u0026rdquo; - Hamming\nI have to fail in at least 90% of my attempts to be successful enough.\nDo the best you can until you know better. Then when you know better, do better. - Maya Angelou\nDon\u0026rsquo;t get sloppy. It\u0026rsquo;s the little things that trip you up. - The Age of Adaline\n\u0026ldquo;Nothing in this world can take the place of persistence. Talent will not; nothing is more common than unsuccessful men with talent. Genius will not; unrewarded genius is almost a proverb. Education will not; the world is full of educated derelicts. Persistence and determination alone are omnipotent. The slogan Press On! has solved and always will solve the problems of the human race. \u0026mdash; Calvin Coolidge. (Also in The Founder)\nFor every moment of triumph, for every instance of beauty, many souls must be trampled. \u0026mdash; Hunter S. Thompson\nOur intuition about the future is linear. But the reality of information technology is exponential, and that makes a profound difference. If I take 30 steps linearly, I get to 30. If I take 30 steps exponentially, I get to a billion. \u0026mdash; Ray Kurzweil\nIt is not the knowing that is difficult, but the doing. \u0026mdash; Chinese proverb\nThere is always a well-known solution to every human problem\u0026mdash;neat, plausible, and wrong. \u0026mdash; H.L. Mencken\nYou are only as young as the last time you changed your mind. \u0026mdash; Kevin Kelly\nBe governed not by the tyranny of the urgent but by the elevation of the important. \u0026mdash; Kevin Kelly\nBecome the best in the world at what you do. Keep redefining what do you do until this is true. \u0026mdash; Naval\nIt is not about the capacity to think but rather the choice of what to think about. \u0026mdash; David Foster Wallace\nIf you want to build a ship, don\u0026rsquo;t drum up people together to collect wood and don\u0026rsquo;t assign them tasks and work, but rather teach them to long for the endless immensity of the sea. \u0026mdash; Antoine de Saint-Exupéry, The Little Prince\nPick battles big enough to matter, small enough to win. \u0026mdash; Jonathan Kozol\nThe most pitiful among men is he who turns his dreams into silver and gold. \u0026mdash; Kahlil Gibran\nDon\u0026rsquo;t be too ambitious. Do the most important thing you can think of doing every year and then your career will take care of itself. \u0026mdash; Henry Kissinger\nNobody will ever win the battle of the sexes. There is too much fraternising with the enemy. \u0026mdash; Henry Kissinger\nIf all you have is an opinion, then I value the expert over you.\nIt is double pleasure to deceive the deceiver.\nTalents can be overrated.\nThese relationships aren\u0026rsquo;t real. They are just means to an end.\nYou can\u0026rsquo;t unfuck what has been fucked.\n100% honesty is not the most diplomatic and the safest option with emotional beings.\nOne miracle at a time, okay sweetie?\nCan you differentiate a preferred stock from livestock?\nAct as if.\nI don\u0026rsquo;t see your face in the mirror every morning.\nThree rules of wall street: never play by rules, never tell the truth and never pay by cash.\nRumours are premature facts.\nI\u0026rsquo;m standing in a shitstorm and nobody has got an umbrella.\nYou got to live in the world you\u0026rsquo;re in. Not the world you wish you were in.\nHow can you learn from your mistakes, if you can\u0026rsquo;t remember them? \u0026mdash; Westworld, S1E1\nConsciousness isn\u0026rsquo;t the journey upwards. It\u0026rsquo;s the journey inwards. It\u0026rsquo;s not a pyramid, it\u0026rsquo;s a maze. (Westworld, S1E1)\nYou don\u0026rsquo;t choose the things you believe in. They choose you.\nNone of us are the same as we were a moment ago and we shouldn\u0026rsquo;t try to be.\nYou\u0026rsquo;re always gonna disappoint somebody. So, fuck it.\nThe past is just a story we tell ourselves.\nSometimes I feel that I have already felt everything that I\u0026rsquo;m ever gonna feel.\nIt\u0026rsquo;s saddening to see we glorify the words people leave behind, yet turn a deaf ear when they scream the same from rooftops.\nNo one can build their happiness on another\u0026rsquo;s pain.\nI loved her, and sometimes she loved me too.\nParadoxically, people like to see their problem solved with some magic while refusing to believe in magic.\nThere is no goodness in being good.\nFor a heart, life is simple. It beats as long as it can; and then it stops.\nTrust is a perfect shit to ruin.\nFalling in love with you was the easiest thing I had ever done.\nDon\u0026rsquo;t be the little guy who thinks he knows how to spell banana but doesn\u0026rsquo;t know when to stop.\nI\u0026rsquo;m on my journey to know everything of something and something of everything. Jack of all trades and master of some.\nI think logically and write codes for the computers to do the same.\nCleopatra to Antony, \u0026ldquo;Go, fight that uprising. Be a military leader! Be a Caesar\u0026rdquo;. \u0026ldquo;I\u0026rsquo;m no Caesar\u0026rdquo;. \u0026mdash; Shakespeare *(*Antony and Cleopatra)\nLife is like a game of Jenga. No matter how well you craft it, eventually, it\u0026rsquo;s going to fall. You might try a ton of strategies and tactics but nothing can prevent it. Not like a jigsaw puzzle that if you try hard and long enough, you will make it.\nEnter with a bang and close the door on your way back!\nYou don\u0026rsquo;t get points for living in pain. \u0026mdash; Krishna Das (Pain, Dalai Lama and Hanuman, Spotify Workshop)\n\u0026hellip;lost in the forests\u0026hellip; - Rig Veda\n","permalink":"/quotes/","summary":"A collection of things well said","title":"Inspiring Quotes"},{"content":"\nMitra Robot, created by Invento Robotics, was launched by Prime Minister Narendra Modi and Ivanka Trump, advisor to the President of the United States in November 2017 at the Global Entrepreneurship Summit (GES) 2017 conference held at Hyderabad.\nBased in Bengaluru, India, Invento Robotics (Invento) was a start-up that manufactured humanoid robots. In October 2017, Invento received a mandate to develop a humanoid robot to welcome India\u0026rsquo;s prime minister and the US president\u0026rsquo;s senior adviser at the Global Entrepreneurship Summit that year in Hyderabad. It was a high-stakes event for Invento; a positive reception to the robots would mean many leads for the company, while a negative response could mar Invento\u0026rsquo;s reputation. While Invento\u0026rsquo;s time had hitherto been spent on research and development, there was pressure for it to now determine an overall marketing strategy. It was imperative that Invento quickly crystallize ideas about its customers, product positioning, branding, pricing, distribution channels, and promotions.\nHarshvardhan, M. and Kumar B. (2020), \u0026ldquo;Invento Robotics: Launching Humanoid Robots\u0026rdquo;. Case study accepted by Ivey Cases, Ivey Business School.\nHarshvardhan, M. and Kumar B. (2020), \u0026ldquo;Teaching Note: Invento Robotics: Launching Humanoid Robots\u0026rdquo;. Teaching note accepted by Ivey Cases, Ivey Business School.\n","permalink":"/invento-robotics/","summary":"Marketing case study on launch of Mitra — a humanoid robot","title":"Invento Robotics"},{"content":"Write Markdown in LaTeX for Notes I often write my \u0026ldquo;chain of thought\u0026rdquo; in documents before writing the full page. I used to use verbatim environment for this. Now, I\u0026rsquo;ve found an alternative \u0026mdash; Markdown directly!\nFirst, make sure you import listings and xcolor packages.\n\\usepackage{listings} \\usepackage{xcolor} Then, create a custom environment:\n\\lstdefinestyle{markdownstyle}{ basicstyle=\\ttfamily\\footnotesize, breaklines=true, moredelim=[is][\\textbf]{**}{**}, moredelim=[is][\\textit]{*}{*}, moredelim=[is][\\textcolor{blue}]{`}{`}, frame=single, backgroundcolor=\\color{gray!10}, } \\lstnewenvironment{markdown}{ \\lstset{style=markdownstyle} }{} Now, whenever you want to write notes, you can put them in \u0026ldquo;markdown\u0026rdquo; environment.\n\\begin{markdown} Significance of research and prior work Include - Discussion on novelty compared to previous work - Why is it a hard problem? \\end{markdown} Cosmetics Bold math symbols: Use the command \\mathbf{} to write bold faced symbols like matrix variables.\nMargins: The easiest way is to add \\usepackage[margin=0.5in]{geometry} in the preamble.\nOuter quotes: Latex doesn\u0026rsquo;t understand \u0026quot; as outer quotes. By default, you have to use ``. Here is a way out.\n\\usepackage [autostyle, english = american]{csquotes} \\MakeOuterQuote{\u0026#34;} Some handy commands: \\hfill, \\vfill, \\hskip, \\vskip, \\hspace, \\vspace. Just Google to know their usage. They\u0026rsquo;re needed for extra spaces here and there in Latex documents.\nHorizontal Line: \\hrulefill for all non-tabular environments.\nImages Inserting Images: Use \\usepackage{graphicx,graphics} in preamble. Then, add image with the following code block. \\begin{figure} \\centering \\includegraphics[width=5in]{example.png} \\caption{An example of fitting \\texttt{GP} model in 1-d function with seven data points.} \\label{fig:example} \\end{figure} Tables Inserting Table: Use Table Generator online. Create the schema and then fill in the content.\nResize Latex Tables to Column-width or Text-width using resizebox:\n\\usepackage{graphics} % ... \\begin{table} \\centering \\resizebox{\\columnwidth}{!}{% \\begin{tabular}{r|lll} \\multicolumn{1}{r}{} \u0026amp; \\multicolumn{1}{l}{Heading 1} \u0026amp; \\multicolumn{1}{l}{Heading 2} \u0026amp; \\multicolumn{1}{l}{Heading 3} \\\\ \\cline{2-4} Row 1 \u0026amp; Cell 1,1 \u0026amp; Cell 1,2 \u0026amp; Cell 1,3 \\\\ Row 2 \u0026amp; Cell 2,1 \u0026amp; Cell 2,2 \u0026amp; Cell 2,3 \\end{tabular}% } \\end{table} Exact Math Symbols argmin and argmax: Use this in preamble: \\DeclareMathOperator*{\\argmax}{argmax} \\DeclareMathOperator*{\\argmin}{argmin} Then, \\underset{x} \\argmax f(x) or \\underset{x} \\argmax f(x). This might not be very right according to this thread, but okay \u0026ndash; it serves the purpose. If you find something better, tell me.\nSum (Sigma): \\sum_{i = 1}^{n} x_n\nIntegral: \\int_a^b f(x) \\di x\nOther math symbols: Overleaf and Oeis Wiki.\nTab: The simple tab (horizontal space) can be worked with \\quad.\nDrawing any symbol: It is difficult to find the appropriate symbol every time, so use Detexify to identify what you need.\nWriting algorithms in LaTeX. Use algorithm and algorithmic. See this article for quick review.\nTypesetting Exactly To place pictures exactly in a slide, use tikz package. Exact coordinates by cm: \\begin{tikzpicture}[remember picture,overlay] %% (x coord, y coord) -\u0026gt; (0 cm, 6.5 cm) \\node[anchor=south west,inner sep=0pt] at ($(current page.south west)+(0cm,6.5cm)$) { {picture.png} }; \\end{tikzpicture} The best part is that it also works for exact text placements.\nReferencing and Cross-referencing Bibliography and References: Understand that in most academic writings they are different and Latex considers references as default. To add them, add following lines at the end of file, just before \\end{document}. \\bibliographystyle{apalike} \\bibliography{bibfile} Don\u0026rsquo;t forget to add \\usepackage{natbib} in the preamble. Note that bibfile.bib contains all bibliographies. If you can\u0026rsquo;t get the BibTeX citations right, use Google Scholar.\nCitations Generator: Use this tool online to generate citations: https://truben.no/latex/bibtex/. Meta New Commands: Outline format is \\newcommand{newname}{definition}. See this and this for more details.\nStyle File (.sty): Basically, they\u0026rsquo;re instructions that can be used to redefine the preexisting values in the document. See my Github for two examples that I\u0026rsquo;ve created - one for homework assignments and other for IIM Indore\u0026rsquo;s official presentation.\nTemplates Overleaf Gallery is the best. Otherwise, you can find some repositories on Google.\nBy Me: If you are searching for reports or presentations, or are an IIM Indore student looking for presentations, check my templates.\n","permalink":"/latex/","summary":"Typesetting isn\u0026rsquo;t that easy","title":"Notes on LaTeX"},{"content":"Podcasts TED: TED\u0026rsquo;s novelty is not a secret. Watch it; you will indeed find an exciting idea.\nTalks At Google: Each time, they invite a great person to share their ideas about unconventional topics. They are pretty brave \u0026mdash; not shying away from calling people who are critical about Google\u0026rsquo;s activities. You may like to start with Yuval Noah Harari\u0026rsquo;s episode.\nPlanet Money: This is an easy listen with unique insights with practical economics lessons and examples.\nLinear Digressions: All about machine learning and statistics.\nComics Calvin and Hobbes: Calvin\u0026rsquo;s view of the world represents an exciting, marvellous vision of the world around him. Hobbes is the perfect companion. Their journey is like two adults taking a world tour in children\u0026rsquo;s body. Industry Research Blogs Reserach at Spotify\nFacebook\u0026rsquo;s General Blog\nResearch at Facebook\nThe Unofficial Google Data Science Blog\nGoogle AI Blog\nNetflix Tech Blog\nNetflix Research\nNetflix Blog\nPersonal Blogs / Essays Vicki Boykis\nXimena Vengoechea\nWho is \u0026lsquo;Red\u0026rsquo;? - Red Russak\nDebarghya Das\nYihui Xie | 谢益辉\nAlison Hill\nJacqueline Nolis\nHome | The Tidy Trekker\nijeamaka anyene\nBrendan Cullen\nAllison Horst\nStephanie Kirmer\nCédric Scherer - Cédric Scherer\nAmber Thomas Data Portfolio \u0026amp; Blog | Amber Thomas Data Portfolio \u0026amp; Blog\nJesse Mostipak\nSara A. Stoudt, PhD\nHooked on Data, Emily Robinson\nKara Woo\nGreg Wilson\nHaystacks by Caitlin Hudon\nLittle Miss Data\nCogito, Ergo Sumana\ndanluu.com\nPaul Graham\nKevin Kelly\nVicki Boykis\nGreg Wilson\nBen Kuhn\nNate Soares\nSam Altman\nData Science Advice Blogs Advice to aspiring data scientists: start a blog\nNon-Academic Careers for Astronomers and Physicists\nConcept of Digital Garden Vicki Boykis | Your public garden | RStudio::rconf, Youtube\nHow Do You Create A Phenomenal Job Talk?\nAmelia McNamara on Twitter: \u0026ldquo;\u0026ldquo;Things that are still on your computer are approximately useless.\u0026rdquo; -@drob #eUSR #eUSR2017https://t.co/nS3IBiRHBn%22/ Twitter\nIndustry Blogs Facebook: https://research.facebook.com/blog/\nNetflix: https://netflixtechblog.com/\nGoogle: https://ai.googleblog.com/, https://blog.google/technology/research/\nSpotify: https://engineering.atspotify.com/, https://research.atspotify.com/blog/\nTechnical Ideas Transfer Learning - Machine Learning\u0026rsquo;s Next Frontier\nAlgorithm Archive · Arcane Algorithm Archive, Encyclopedia of algorithms\nProgramming Advent of Code is a programming challenge.\nProject Euler is similar.\nR My Opinion on R\u0026rsquo;s Upcoming Pipe\nWhy performance is not the issue in R Why R? 2020 Discussion Panel - Performance in R, Youtube Video\nQuanteda Why R? 2020 | Ken Benoit - Why you should stop using other text mining packages and embrace quanteda\nVisualising distributions with raincloud plots and how to create them with ggplot2\nAn R User\u0026rsquo;s Note on Learning Python\nJitter in boxplots to show distribution\nUsing databases with Shiny\nMartin Wattenberg On Visualizing Large Textual Data | FlowingData\nR as a First Programming Language\nDo We Need Object Oriented Programming in Data Science? | by Rose Day | Oct, 2020 | Towards Data Science\nOther Good Reads Articles Tech is improving at an exponential rate. Here\u0026rsquo;s what that means for society\nA free introduction to quantum computing and quantum mechanics, A New Kind of Book\nMaps, Simpson\u0026rsquo;s Paradox, Sienfield, and what not.\nHere\u0026rsquo;s Why Movie Dialogue Has Gotten More Difficult To Understand (And Three Ways To Fix It)\nEncyclopedic Knowledge, Then vs. Now - The New York Times\n12ft | Microsoft Encarta Dies After Long Battle With Wikipedia - The New York Times\nEarly Civilizations Had It All Figured Out | The New Yorker\nThe Surprisingly Big Business of Library E-books | The New Yorker\nMartha\u0026rsquo;s Rules for Group Decision Making\nGreg Wilson\u0026rsquo;s Ideas\nSearching for Susy Thunder\nTwitter Threads Jessica Price on Twitter: \u0026ldquo;Since I\u0026rsquo;ve gotten a lot of \u0026ldquo;why won\u0026rsquo;t you debate me?\u0026rdquo; in response to talking about tabletop/geekdom\u0026rsquo;s problems, I\u0026rsquo;ll tell you.\u0026rdquo; / Twitter\nNaval on Twitter: \u0026ldquo;How to Get Rich (without getting lucky):\u0026rdquo; / Twitter\nBlogs (Maths) The Mathematics of Mind-Time\nA Brilliant Biography about about Gödel Kurt Gödel\u0026rsquo;s Brilliant Madness\nHow to imagine four or more dimensional objects Intuitive crutches for higher dimensional thinking\nPrint Email me if you need a scanned copy of these items (for personal use only)\nChapter on Mental Health in Bitwise: A Life in Code\nPlayboy Interview: Steven Jobs, The best interview I\u0026rsquo;ve ever read by a margin\nIt\u0026rsquo;s Happened Again by Oscar Schwartz. Wired. Print: February 2022\nThe Big Inhale by Vince Beiser. Wired. Print: February 2022\nYouTube How To Speak by Patrick Winston, MIT OCW\nNASA | Thermonuclear Art \u0026ndash; The Sun In Ultra-HD (4K)\nA Buddhist asks Sadhguru a Puzzling Question #Vipassana\nMost People Don\u0026rsquo;t Know How Bikes Work\nJoe Rogan Experience #1309 - Naval Ravikant - YouTube\nCool Apps Chrome Tab Link Copier - Chrome Web Store\nMac jaywcjlove/awesome-mac:  Now we have become very big, Different from the original idea. Collect premium software in various categories.\nMac serhii-londar/open-source-mac-os-apps: 🚀 Awesome list of open source applications for macOS.https://t.me/s/opensourcemacosapps\nTodoist Blogs Benefits of Walking Meetings Can Walking Meetings Lead to Healthier Lives?\n60+ Resources for Students Who Want to Stay Productive\nHedonic Adaptation: Why we\u0026rsquo;re never satisfied with what we\u0026rsquo;ve already accomplished\nThe Mere Urgency Effect\nThe Planning Fallacy: Why We\u0026rsquo;re Terrible at Setting Realistic Deadlines\nThe Zeigarnik Effect: Why Unfinished Work Follows Us Home\nComplexity Bias: Why We Overcomplicate Our Lives\nThe Two-Minute Rule: Stop Procrastinating With This Simple Trick\nThe Life-Changing Magic of Tidying Up Your To-Do List\nThe Sunk Cost Fallacy \u0026amp; Your Productivity - Doist Blog\n","permalink":"/what-i-find-interesting/","summary":"\u003ch3 id=\"podcasts\"\u003ePodcasts\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003e\n\u003cp\u003e\u003cstrong\u003eTED:\u003c/strong\u003e TED\u0026rsquo;s novelty is not a secret. Watch it; you will indeed find an exciting idea.\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003e\u003cstrong\u003eTalks At Google:\u003c/strong\u003e Each time, they invite a great person to share their ideas about unconventional topics. They are pretty brave \u0026mdash; not shying away from calling people who are critical about Google\u0026rsquo;s activities. You may like to start with Yuval Noah Harari\u0026rsquo;s episode.\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003e\u003cstrong\u003ePlanet Money:\u003c/strong\u003e This is an easy listen with unique insights with practical economics lessons and examples.\u003c/p\u003e","title":"What I Find Interesting"},{"content":"In 1858, after India\u0026rsquo;s first war of independence, the British Empire set up a new government executive arm \u0026mdash; the Indian Civil Service. Trained at the University of Oxford, University of Cambridge, University of London and Trinity College, Dublin for two years, these elite executives were responsible for overseeing all government activities in the British Raj. Today, they train at Lal Bahadur Shashtri National Academy of Administration for two years with very similar content. Soon after, they join different departments of government and lead the personnel.\nAnyone who knows IAS \u0026mdash; the civil services is officially known as Indian Administrative Services since independence \u0026mdash; would agree it requires significant reforms. India is world-famous for bureaucracy, and it shows up in whatever task you pick up to do. From registering birth certificates and transferring land deeds, you require these officers\u0026rsquo; approval for everything. In the past, the services have been accused of institutional corruption, inefficiency and misalignment, bribery, misappropriation of funds and abuse of power \u0026ndash; least reported but most serious.\nHere is what Carnegie Endowment for International Peace said almost five years ago:\nThe IAS is hamstrung by political interference, outdated personnel procedures, and a mixed record on policy implementation, and it is in need of urgent reform. The Indian government should reshape recruitment and promotion processes, improve performance-based assessment of individual officers, and adopt safeguards that promote accountability while protecting bureaucrats from political meddling.\nAll of these points are one hundred per cent applicable today.\nWhy is hiring government workers damaging to government treasury \u0026mdash; at least in the present style? Hiring a full-time employee as a civil servant is costly, slow and risky. The government has to train them as a jack of all trades, apart from high salaries and unreal retirement benefits. The process of floating recruitment notification, conducting exam and interviews, training has two years\u0026rsquo; lead time. Even then, their expertise would be limited compared to a consultant or economist or public policy expert who have been working for several years in this area.\nLet me pick up the example of the most coveted government job \u0026mdash; IAS. Once the combined results of the written exam and the interviews are out, higher-ranked individuals choose administrative positions, which means working as district collector, government secretaries, among others. The next popular choice is police service which gives India its police chiefs. Many other groups like forest officers, foreign officers (ambassadors), etc., all chosen from this exam only. Do you think a single standardised test can identify so many different kinds of officers reliably and correctly?\nAfter an officer is hired, the government has to pay through its nose on multiple facets. In the name of job security, governments provide allowances that exceed the base pay. Dearness allowances aimed to tackle inflation, increase at rates much higher than actual inflation. Additionally, the base pay multiplies every few years. As the economists call it, the effective cost to the treasury doubles every four to five years.\nThe retirement benefits are immense. One is awarded an inflation-adjusted pension for life. After their death, the allowance continues to be rewarded for life. Most importantly, these officers cannot be discharged from their job. In case an officer is found breaking the rules, there is little that governments can do. You would likely be sent on a remote location post or suspended in a grave situation. Firing is virtually impossible.\nAll of this would sound like unlocking an achievement of immunity and security to a naive reader, which is how everyone in India treats a government job.\nNon-permanent hirings might look expensive on paper. Typical job listings would have pay much higher than the base pay of government individuals. Because these officers are hired for shorter times with no permanent benefits, there is a significant reduction in cost to the treasury. Limiting off-pay slip benefits like pension, dearness allowances, among others, would reduce the exchequer\u0026rsquo;s bill.\nWhy are government employees are not equipped for handling modern administration and transformation? Around 2014, many state governments sought the help of private players in transforming the state education system. In 2014, the Haryana government involved BCG to chart out plans for modernising state education. It tried various academic reforms in the curriculum, roped in multiple non-profits for regular funding and instituted weekly tests. Around the same time, the Rajasthan government also roped in BCG to channelise system-level reforms in its state education. The most impactful results were seen in Delhi education policy. I don\u0026rsquo;t want to get into the actual impact, as BCG documented it well here.\nDelhi government\u0026rsquo;s case on improving education has proved consultancy is cheaper than public work for problem-solving. It worked out so well that then US First Lady Melania Trump included a visit to Delhi public school in her itinerary. In fact, at 97.8%, Delhi government schools scored the best results ever.\nExternal consultants \u0026mdash; bereft of bureaucracy \u0026mdash; look at the problems differently than government employees. Today in India, consultants have thrived their business of expertise consulting at different levels of government. Governments working in perpetuity and consultants have become a convenient proposition that provides fast and accurate policy reports and policy analysis. My submission is to take it to the next step.\nWhat is the way out? The solution is easy to imagine and difficult to implement. Government should stop hiring for future positions. Instead, they should rope in interested people and private organisations to complete the same job. Next, the government should double down on lateral hiring and hire experienced and expert individuals at secretariat positions. Graduates from IIMs, IITs, JNU, etc., are suitable for the required functions if inexperienced graduates are necessary. Recent NITI Aayog\u0026rsquo;s hiring and Ministry of Finance\u0026rsquo;s hiring are glowing examples of how this process could work.\nI got the idea to write this article because someone shared a picture of the health benefits of a government job with me. It showed how government pays for the employees and their parent\u0026rsquo;s health requirements. I quickly noted that such family insurance would cost ₹ 2000 per month. With five years of experience, one would earn a lot more working for a private company and easily afford the difference of ₹ 2000 per month. I had an urge to write my ideas in a post. Welcome to my TED talk. :grin:\n","permalink":"/how-is-governments-employment-system-broken/","summary":"My thoughts on why government hiring is broken and needs system-level redesign to improve.","title":"How is Government's Employment System Broken?"},{"content":"\nLive Project Surveyed 300+ participants and interviewed around 20 stakeholders like nutrition experts, gym trainers, etc. Analysed data gathered using SPSS to decipher awareness of nutrition products, know customer segments, target groups and their behavioural characteristics Studied product\u0026rsquo;s competitors, marketing platforms and distribution channels to recommend a new digital marketing strategy ","permalink":"/bliss-lifesciences/","summary":"Marketing research live project","title":"Bliss Lifesciences"},{"content":"Are COVID-19 numbers reported by countries altered? I test the validity of COVID-19 new daily cases for every country using Benford\u0026rsquo;s Law. Since the pandemic gained global centre stage, there has been a surge in data manipulation accusations. Independent media agencies questioned country-level data, and all of us made our own conclusions if the data is correct.\nBenford\u0026rsquo;s law tells us that in many naturally occurring collections of numbers, the leading digit is likely to be small (Wikipedia).\nTake any naturally occurring dataset. If you count the frequency of the first digits, 1 appears around 30 per cent of the time, 2 appears for 18% of the time, \u0026hellip;, and 9 appears around 5 per cent of the time. This amazingly simple \u0026ldquo;law\u0026rdquo; can be an authoritative proof to start an investigation \u0026mdash; following natural justice, we shouldn\u0026rsquo;t call anyone guilty of manipulation just on this violation.\nMathematically, the digits\u0026rsquo; occurrence probability can be modelled using Benford\u0026rsquo;s Distribution, with the following probability distribution function.\n$$ P(d) = \\log_{10}(d+1) - \\log_{10}(d) = \\log_{10}\\left(\\frac{d+1}{d}\\right) = \\log_{10} \\left(1 + \\frac{1}{d}\\right) $$\nThe probability of each digit comes out exactly as the following.\nThe law is so universal that the Income Tax Department uses it to detect fraud, legal cases have admitted it as evidence, regulators analyse prices to see cartel-like behaviour, forensics use it to identify deep-fakes and doctored videos, among others, and in our case, COVID-19 data reported by countries. The Netflix TV-series \u0026ldquo;Connected\u0026rdquo; did an episode \u0026ldquo;Digits\u0026rdquo; on Benford\u0026rsquo;s Law. It is absolutely brilliant and you should watch it.\nApproach My approach is simple and direct. Using the COVID-19 data available at Our World in Data (John Hopkins University), I modelled each country\u0026rsquo;s daily cases using R and found first-digit distribution using the benford package in R.\nI measured how much they differed from the expected proportion as Root-Mean-Square-Error (RMSE). A lower RMSE value would mean more accurate data reporting.\nResults Explore the world map below to see evidence of manipulation for each country.\nR Codes\n##### Map plot library(\u0026#34;sf\u0026#34;) library(\u0026#34;rnaturalearth\u0026#34;) library(\u0026#34;rnaturalearthdata\u0026#34;) world = ne_countries(scale = \u0026#34;medium\u0026#34;, returnclass = \u0026#34;sf\u0026#34;) world1 = merge(world,world1,by.x = \u0026#34;iso_a3\u0026#34;, by.y = \u0026#34;iso\u0026#34;) world_points\u0026lt;- st_centroid(world) world_points \u0026lt;- cbind(world, st_coordinates(st_centroid(world$geometry))) ggplot(data = world1) + theme_bw() + geom_sf(aes(fill = rmse)) + geom_text(data= world_points,aes(x=X, y=Y, label=name), col = \u0026#34;grey\u0026#34;, check_overlap = T, size = 1.5) + scale_fill_viridis_c(option = \u0026#34;plasma\u0026#34;) + labs(x = \u0026#34;\u0026#34;, y = \u0026#34;\u0026#34;, fill = \u0026#34;RMSE\u0026#34;, caption = \u0026#34;Benford analysis is used for fraud detection. Low RMSE is associated with low fraud probability.\\nData from John Hopkins University (Our World in Data, May 23, 2021). Analysis and viz by Harshvardhan.\u0026#34;, title = \u0026#34;Are Countries Manipulating COVID-19 Data?\u0026#34;, subtitle = \u0026#34;Benford Analysis on COVID-19 Daily Cases\u0026#34;) ggsave(\u0026#34;country.png\u0026#34;) As of May 23, 2021 The countries with very little evidence of manipulation \u0026mdash; RMSE less than 3 \u0026mdash; are the following.\nComoros, Somalia, Dominica, Republic of Congo, Vatican, Solomon Islands, United Republic of Tanzania, Samoa, Vanuatu, Marshall Islands, Federated States of Micronesia.\nThe countries with very strong evidence of manipulation \u0026mdash; RMSE more than 22 \u0026mdash; are the following.\nBelarus, Tajikistan, Netherlands, Russia, Egypt, Iran, Iraq, Qatar, South Korea, Colombia, Italy, Turkey, Honduras, Brazil, China, Kuwait, Mexico, Algeria, El Salvador.\nMy home, India, doesn\u0026rsquo;t show strong evidence of manipulation (RMSE = 7.32). China (RMSE = 22.6) and Russia (RMSE = 28.9) show major evidence of manipulation. These results are as on May 23, 2021 and can change in future. Use the app below for live results.\nIf you are curious about any specific country, I made a Shiny app for each country on live data. Check it out below or here.\nThe above app takes in live COVID-19 cases data from Our World in Data and analyses using R Shiny. Therefore, the numbers and graphs are updated frequently.\nHere is the complete list of countries and their RMSE values (as of May 31, 2021). The codes for the Shiny app and generating the plots, etc. are included in this Github repository.\nOf course, much detailed analysis is required to conclude anything confidently. Benford\u0026rsquo;s Law can give misleading conclusions like 2020 US Elections, and it might as well be the case here. This is only a first-level analysis. Beyond first-order, expert eyes are required to find how fair is Benford in this case.\n","permalink":"/is-covid-19-data-tampered/","summary":"Is there any evidence of tampering or manipulation in COVID-19 daily cases reported by countries? Using Benford analysis in R, I try to reach at some conclusion.","title":"Is COVID-19 Data tampered?"},{"content":" Central Limit Theorem App: app.R\nSlides 2021: here\nSlides 2020: here\nRecently, I gave a presentation on R Shiny Apps to an FPM (PhD) class. Although the session was quite brief in itself, we developed a small central limit theorem demonstrator. It\u0026rsquo;s not feature-rich or anything - just serves the purpose of seeing central limit theorem in action. Thanks a lot to Prof Pritam Ranjan for giving me this wonderful opportunity.\nFrom R Studio \u0026ndash; Shiny Apps:\nShiny is an R package that makes it easy to build interactive web apps straight from R. You can host standalone apps on a webpage or embed them in R Markdown documents or build dashboards. You can also extend your Shiny apps with CSS themes, htmlwidgets, and JavaScript actions.\nIn this small guest lecture, my purpose was not to demonstrate what to do with Shiny apps, as the uses are virtually endless. My goal was to leave them on a good pedestal so that they can pick it up as and when required. The clear thing about Shiny is its ease of use.\nUpdate on Feb 13, 2021 I delivered this guest lecture again for the next iteration of the course. This time, it was a full session \u0026mdash; I got 75 minutes to explain Shiny apps. I enjoyed it and the feedback was much better than last year.\n","permalink":"/demonstrating-shiny-apps/","summary":"Guest lecture for PhD students on R Shiny (2020 and 2021)","title":"Demonstrating Shiny Apps"},{"content":" For preprint of the actual paper, click here.\nMalaria is a mosquito-borne disease caused by a Plasmodium, a malarial parasite. Although Malaria is not life-threatening by its nature, it can cause severe illness and prove fatal if left untreated.\nIn February 2019, a new Malaria vaccine, RTS, S — known by the trade name Mos-Quirix — was approved for human trials in Ghana, Malawi and Kenya, coordinated by WHO. The study is expected to get over by December 2022. However, several pharmaceutical majors have begun showing interest in the vaccine’s mass production in the last few months.\nThe companies want to estimate the coverage ratio — defined as the vaccinated population count divided by the total population. This research aimed to forecast the same for all 78 affected countries using the Dynamic Gaussian Process Model.\nModel A vaccine’s coverage in a country or geography depends on several factors: how effective is the vaccine? How many people are scared of not taking the vaccines? How many doses it has? Is the disease contagious?\nEach country will have specific characteristics which are difficult to quantify. Therefore, the best approach is to select countries with “similar” features instead of only one model. We decided to work with the Human Development Index for grouping countries in this work.\nHDI heatmap of world Variables in Modelling Dependent variable (Y) was the time-series coverage ratio of Malaria vaccine for the next T years. X1 to X6 were independent variables.\nY = time-series of coverage ratios for the next T years X1 = Dosage number. The value is k, if k doses of the vaccine have already been given. Multiple dosages result in lower coverage. X2 = Dosage time. The number of months after birth when the first dosage is taken; 0 represents ‘at birth’. Typically, vaccines given at birth have higher coverage as there’s no need to return to the hospital. X3 = Efficacy. The ability of the vaccine to actually prevent the disease. Higher efficacy creates stronger motivation for vaccination. X4 = Incidence per lakh. It is more likely that the parents will give their children the vaccine if the disease’s occurrence is high. When incidences are high, the population is more careful about prevention. X5 = Communicability. Fear of contagion drives vaccination. X6 = Years active. As time passes, coverage naturally grows. Implementation The training data was too big to fit a full svdGP model on a standard laptop. We implemented the localized model (i.e., lasvdGP) developed by Zhang et al. (2018) for the model fitting. The localized model considers only observations that “closely resemble” itself for modelling instead of considering all points.\nOf the ten countries in each group, not all are used for modelling. Instead, selected few for each variable and observation are used. This “closeness” is decided based on clustering within the country group.\nWe executed all the methods with the DynamicGP package in R.\nResults From what was known, we considered that the first dose of vaccine (X1 = 0) was given to a six-month infant (X2 = 6). Malaria is known to be non-communicable (X5 = 0). We assumed the average incidence value as 60% (X4 = 60) and the vaccine’s efficacy at 70% (X3 = 70).\nWe forcasted the coverage for the next 38 years using these assumptions and inputs. The choice of 38 years was arbitrary; the simulation could handle more periods—however, the more extended the time considered, the lesser the measure’s reliability.\nHere is the vaccine coverage at two crucial time points: soon after the launch and at the end of the simulation (38 years after launch).\nVaccine coverage at t=0 and t=38 Vaccine coverage through the years is as follows:\nvaccine coverage ratio over the years What’s Even More Interesting As evident from the figure, some countries start with higher coverage ratios and lead thirty years down the line. Some groups, like group 8, remain low for the entire duration. Group 9 and 10 catch up quickly with groups 1 and 2.\nCountries that score low on HDI get all the attention, NGOs and ilk. They receive higher external funding and support.\nCountries that score high on HDI have established infrastructure and facilities to roll out the programs quickly. The countries in groups three to eight are less better off, with eight being the worst.\nThere are also trends in seasonality. The coverage ratios show spikes at the end of every decade from the launch year. This could be due to ‘anniversary’ coverage news and attention. Also, the agencies responsible for vaccinating might be pushing themselves to complete their 10-year targets.\nHowever, a more in-depth study is necessary to make any definitive conclusions.\nहर साल मलेरिया लाखों जानें लेता है। मलेरिया की वैक्सीन अभी तीन देशों में फील्ड ट्रायल पर है। हमारा मॉडल कहता है की वो दिन अब दूर नहीं जब हम मलेरिया को टाटा - बाय बाय बोलेंगे!@rpbreakingnews में प्रोफेसर प्रीतम रंजन और मेरे रिसर्च का बयां। @IIM_I pic.twitter.com/7m7sKDFUJj\n\u0026mdash; Harshvardhan (@harshbutjust) September 7, 2022 ","permalink":"/forecasting-malaria-vaccine-demand/","summary":"Summary of my research work on forecasting coverage for Malaria vaccines","title":"Forecasting Malaria Vaccine Demand"},{"content":"We applied a dynamic Gaussian process model to predict coverage for novel Malaria vaccines in 78 countries. Using publicly available WHO data on coverage of nine vaccines, we developed localised models for countries grouped using the human development index (HDI). We deployed convolutions of standard GP models with weights determined using singular value decomposition of time-series response matrix.\nAbstract Gaussian process (GP) based statistical surrogates are popular, inexpensive substitutes for emulating the outputs of expensive computer models that simulate real-world phenomena or complex systems. Here, we discuss the evolution of dynamic GP model — a computationally efficient statistical surrogate for a computer simulator with time series outputs. The main idea is to use a convolution of standard GP models, where the weights are guided by a singular value decomposition (SVD) of the response matrix over the time component. The dynamic GP model also adopts a localized modeling approach for building a statistical model for large datasets. In this chapter, we use several popular test function based computer simulators to illustrate the evolution of dynamic GP models. We also use this model for predicting the coverage of Malaria vaccine worldwide. Malaria is still affecting more than eighty countries concentrated in the tropical belt. In 2019 alone, it was the cause of more than 435,000 deaths worldwide. The malice is easy to cure if diagnosed in time, but the common symptoms make it difficult. We focus on a recently discovered reliable vaccine called Mos-Quirix (RTS,S) which is currently going under human trials. With the help of publicly available data on dosages, efficacy, disease incidence and communicability of other vaccines obtained from the World Health Organisation, we predict vaccine coverage for 78 Malaria-prone countries.\nCitation Ranjan, P. and Harshvardhan, M. (2022). \u0026ldquo;The Evolution of Dynamic Gaussian Process Model with Applications to Malaria Vaccine Coverage Prediction\u0026rdquo;. In D.D. Hanagal, R.V. Latpatel \u0026amp; G. Chandra (Eds.), Applied Statistical Methods: ISGES 2020. Springer. Singapore.\nLinks PDF: https://www.harsh17.in/docs/malaria_paper.pdf arXiv: https://arxiv.org/abs/2012.11124 Google Books: https://www.google.co.in/books/edition/Applied_Statistical_Methods/h1-vzgEACAAJ?hl=en Blog: https://www.harsh17.in/forecasting-malaria-vaccine-demand/ Springer: https://link.springer.com/book/9789811679322 ","permalink":"/dynamic-gp-application-to-malaria-vaccine-coverage-prediction/","summary":"We applied a dynamic Gaussian process model to predict coverage for novel Malaria vaccines in 78 countries. Using publicly available WHO data on coverage of nine vaccines, we developed localised models for countries grouped using the human development index (HDI). We deployed convolutions of standard GP models with weights determined using singular value decomposition of time-series response matrix. \u003ca href=\"https://www.harsh17.in/docs/malaria_paper.pdf\"\u003e🔗 PDF\u003c/a\u003e","title":"Dynamic GP: Application to Malaria Vaccine Coverage Prediction"},{"content":"IPM, IIM Indore is an unconventional programme. In a country like India when most high school students think of engineering or medicine as the only possible career option, clearly an eclectic course like IPM will attract attention \u0026ndash; and rightly so. However, being relatively new programme (less than ten years old), little reliable information is available.\nMany aspirants of the programme have previously contacted me for \u0026ldquo;tips\u0026rdquo; to the entrance. Here\u0026rsquo;s an anthology of preparation that I usually suggest.\nNote: Before you go further, I\u0026rsquo;d suggest you to take a look at the official website.\nEntrance Exam It is called IPM-AT. Questions are of two kind: verbal and quants. IIM Indore also releases sample questions every year (here\u0026rsquo;s 2019 paper). The sample questions will give you a clear idea of what is expected. If someone around you has a CAT preparation book, grab it now. The question pattern is largely the same.\nBooks and Online Resources Books dueNorth: https://www.amazon.in/dp/935306970X/ref=cm_sw_em_r_mt_dp_Yo4tFb4BT7VCT Disha: https://www.amazon.in/dp/9389418976/ref=cm_sw_em_r_mt_dp_Rq4tFbZRK5SYD The content of the books is largely the same. Note that, I\u0026rsquo;ve not read either of the books; they are the popular choices that my friends found helpful.\nOnline Resources (Free) 2IIM: https://online.2iim.com/Indore-IPM-Rohtak-IPM-sample-paper/ 2IIM CAT: https://iim-cat-questions-answers.2iim.com Hitbullseye: https://grad.hitbullseye.com/IPM-Previous-Year-Question-Paper.php These are just few question banks I could find using simple Google search. If you need more, search.\nInterview Interview questions revolve around three things: culture fit (i.e. do you really want to pursue management?), learning ability (do you \u0026ldquo;get\u0026rdquo; what you do?) and general awareness. A majority of questions would be around your past academics; brush up your class 11th and 12th subjects. You can read my interview experience on Quora.\nEssentially, you can do three things to prepare for your IPM interview:\nCatch up with news: read newspapers \u0026ndash; online or offline. Know about regional developments and national economic policies. More importantly, have opinions about them (but don\u0026rsquo;t be arrogant about them). Get the dust off your class 11 and 12 books. Revise most conceptual things; theorems in Physics, basic algorithms in Computer Science, etc. Recollect your past achievements and prepare stories around each of them. You may have heard of Situation, Task, Action and Result (STAR) framework. Try to have most, if not all of STAR, in your responses. Don\u0026rsquo;t tell them A is the situation, B is Task and so on. Imbibe them in your story. WAT WAT is a relatively new addition to entrance tests \u0026mdash; it wasn\u0026rsquo;t there in my time. However, from what I\u0026rsquo;ve heard the topics are fairly generic and aim to gauge your language abilities. They are not a test of your general awareness directly.\nHere are a few online resources that might help:\nhttps://www.mbauniverse.com/wat-for-iims/topic.php https://www.mbarendezvous.com/wat-topics/1.html https://gdpi.hitbullseye.com/MBA/GD-and-Essay-Topics.php https://www.xamnation.com/wat-topics-for-iim-and-mba-admissions/ For more topics, just Google.\nIIM Ranchi WAT-PI Compendium This document by IIM Ranchi should help you immensely in preparing for interviews and WAT.\nFinally! If you do need additional help \u0026mdash; anything that\u0026rsquo;s beyond the content of this page \u0026ndash; feel free to contact me or any current IPM student. If you don\u0026rsquo;t know them yet, LinkedIn is your friend.\nGood luck!\nMy Thoughts on IPM How being an IPMer helped me be a better researcher? Added on May 19, 2022.\nThe courses taught at IIM-I were a valuable combination of fields. Learning statistics, economics and humanities together enriched my thinking methods beyond what a single class could do. Especially, Stat Methods 1 \u0026amp; 2, Economics Statistics and Econometrics have been handy.\nThe programming exposure to different languages allowed me to look beyond one specific language now that I code a lot. I use R and Python mostly these days, but little exposure to Java and C++ goes a long way in debugging.\nSummer research in the first and second-year summer is not well advertised but you can do it. I used that option, though very few (if any) of my classmates even knew about this opportunity. CIS is one way to do research but has limitations. You usually do it for a term, can only do if your GPA is greater than 3 and can\u0026rsquo;t do it until the third year.\nSummer internships go a long way. In my foundational years, exposure to pure statistics interested me in research.\nThe focus on projects is excellent too. We learn essential skills like collaboration (and how to manage social loafing) in group projects. Independent projects improve research skills and seed the idea of self-motivation.\n","permalink":"/ipm-what-you-need-to-know/","summary":"A Quick Primer on How to Prepare for IPM","title":"IPM: What You Need To Know"},{"content":"Funding education is probably the most heavily debated topic in recent years, with both sides having equally strong views. Some say education is the state\u0026rsquo;s duty and a citizen\u0026rsquo;s right; others call it the individual\u0026rsquo;s responsibility. But the consensus is that school education is the government\u0026rsquo;s responsibility.\nWe all know the quality of education in government schools and how competition has made them inefficient. In his book \u0026ldquo;Free to Choose\u0026rdquo;, Milton Friedman proposed an excellent solution: divide the total government expenditure on school education by the number of students. Allow everyone to use their share to pay for school fees. The schools can then get the funds back from the government. The students would be wary of spending their coupons on bad schools, fostering competition between schools. Simple yet effective.\nThis article is not about funding schools. It is about college education that many won\u0026rsquo;t consider the government\u0026rsquo;s responsibility.\nNow, I don\u0026rsquo;t think it\u0026rsquo;s justified to burden the government with the educational expenses of college students. A peasant earning a meagre income shouldn\u0026rsquo;t pay (indirectly) for the education of a future engineer who probably will not help that peasant directly. Everyone who benefits from training should pay for it by themselves. In that spirit, the best solution is a flexible payment method - like easy access to loans or paying fees when we get jobs.\nSadly, not all have access to loans. There could be many reasons \u0026mdash; poor or incomplete credit history, inaccessible banks and whatnot. A system where one pays college tuition when they start working is excellent but unfeasible for the colleges as they require money for operations. I had another - in my opinion, better - idea on how to fund a college education. Every college can make a general fund for each batch of each programme. Everyone (corporates, companies, anyone) can buy units of these funds \u0026mdash; something like mutual funds. Then, each student\u0026rsquo;s share is the entire fund\u0026rsquo;s value divided by the number of students. Suppose the fees turn out to be more than an individual student\u0026rsquo;s share. In that case, the student can pool in extra money (and also understand that his course isn\u0026rsquo;t as valuable as the fees charged).\nWhy would anyone buy units of these funds? Well, the student will have to agree to \u0026ldquo;return\u0026rdquo; a portion of their salary - say 10%. It would be like the Swedish pension scheme where a working professional pays for the education they received years ago.\nNot all students will return, as not all will be employable. The return rate can be higher for college programmes, with students agreeing to return higher to the fund. Considering the most recent developments in the job market, the \u0026ldquo;colleges funds\u0026rdquo; will be valued by the market. If the efficient market hypothesis is true, this allocation will be optimal. The colleges get their fees irrespective of fluctuations in the graduate\u0026rsquo;s job outlook since these college funds will launch at the start of every academic year. The investors undertake all risks for the gestation period (when the student is in college). These risks are represented in the \u0026ldquo;returns\u0026rdquo; - what portion of a graduate\u0026rsquo;s salary has to return.\nThis solution can keep all stakeholders happy together: colleges, students and corporates. But, of course, there are severe ramifications to this plan. Who will ultimately manage this fund? Do colleges have capabilities to handle schemes like these? These are all genuine but solvable questions. I\u0026rsquo;m confident we can get experts who have sufficient abilities to execute such funds when it comes to implementation.\nI\u0026rsquo;m also reasonably confident this isn\u0026rsquo;t going to be implemented anytime soon. We couldn\u0026rsquo;t pull off the coupons in the school system despite their showcased success. I\u0026rsquo;m not so optimistic about this getting executed.\n","permalink":"/a-better-way-to-fund-college-education/","summary":"How about a listed mutual fund for investing in college education?","title":"A (better) Way to Fund College Education"},{"content":"Two days back I got curious about the Twitter API. I worked with a few APIs (using R) in the past but had never chanced upon using Twitter data. Additionally, Twitter provided a rich source of \u0026ldquo;what people are talking about\u0026rdquo;. I searched and found a very easy to use package for R called rTweet. This package\u0026rsquo;s simplicity and easiness blew my mind.\nI went deep into it and found several useful functions like search_tweets(), stream_tweets() and get_timeline(). Of course, there are many more functions, have a look at their reference list.\nBut, why just stop there? tidytext allows very easy to use unigram sentiment analysis. I thought of finding the \u0026ldquo;positive\u0026rdquo; and \u0026ldquo;negative\u0026rdquo; words used on Twitter.\nTo start with, I tracked Kerala\u0026rsquo;s elephant murder: an incident in Kerala where an elephant died allegedly due to crackers blasting in its mouth. This incident had grabbed national and international attention bringing organisations like PETA to the forefront.\nI first searched for last 10,000 tweets on Twitter, did some cleaning and finally analysed for sentiments.\nR Codes library(rtweet) # Twitter API medium library(ggplot2) # for plotting library(dplyr) # for piping operator and handling tibbles library(tidytext) # text mining libraries library(textdata) rt = search_tweets(\u0026#34;Kerala+Elephant,lang:en\u0026#34;,n = 10000, include_rts = F) #clear all links rt$updated_text = gsub(\u0026#34;https.*\u0026#34;,\u0026#34;\u0026#34;,rt$text) rt$updated_text = gsub(\u0026#34;http.*\u0026#34;,\u0026#34;\u0026#34;,rt$updated_text) #convert all texts to lowercase and remove punctuations rt2 \u0026lt;- rt %\u0026gt;% dplyr::select(updated_text) %\u0026gt;% unnest_tokens(word, updated_text) #removing stop words data(\u0026#34;stop_words\u0026#34;) #nrow(rt2) rt2 = anti_join(rt2,stop_words) #nrow(rt2) # now, I\u0026#39;ll attach each word to its sentiment using the dictionary \u0026#34;bing\u0026#34; rt3 = rt2 %\u0026gt;% inner_join(get_sentiments(\u0026#34;bing\u0026#34;)) %\u0026gt;% count(word, sentiment, sort = T) %\u0026gt;% ungroup() Essentially, hardly anyone asked why the person did what they did. If it was even purposeful. Everyone just made a smiley face. A sad smiley face. :disappointed:\nWhy Stop There? I thought of analysing the tweets by two world leaders: Narendra Modi (our PM) and Donald Trump (US President).\nI only needed to get two different sets of tweets and the rest of code remained the same.\nrt1 = get_timeline(\u0026#34;realDonaldTrump\u0026#34;, n = 10000, include_rts = F) rt2 = get_timeline(\u0026#34;narendramodi\u0026#34;, n = 10000, include_rts = F) Narendra Modi 🐚 Donald Trump 🗽 Clearly, Modi uses many more \u0026ldquo;positive\u0026rdquo; words than Trump. Many of Modi\u0026rsquo;s negative words are also probably used in positive and hopeful sentences: poor, needy, etc. Trump\u0026rsquo;s characteristic with his fake (news). They both are using words associated with pandemic: virus, crisis, panic, attack, etc.\nHere\u0026rsquo;s a proportion comparison of the overall sentiment.\nNarendra Modi 🐚 Donald Trump 🗽 There\u0026rsquo;s a marked difference between how the two leaders - Narendra Modi and Donald Trump - tweet. Around 75% positive for Modi; 40% for Trump. Of course, I could go on comparing more but I exceeded the Twitter request and have to wait for another 15 minutes. Plus, my aim of basic understanding as to how to use API and unigram sentiment analysis was achieved.\nHave a great day!\n","permalink":"/twitter-sentiments/","summary":"Using rTweet package in R to analyse live tweets from Twitter","title":"Sentiment Analysis of Tweets"},{"content":"I have been using Spotify for two years now. I was always curious to know my listening trends, which is why I started using last.fm and other tools like that. However, they didn\u0026rsquo;t give me exact information as to what I listened \u0026ndash; the insights Spotify gave me were limited to what Spotify thought! I couldn\u0026rsquo;t know more about which artists I listened to, what were the top songs from that artist and so on.\nTherefore, I looked up for my listening data on Spotify (thank you, GDPR!) and then made some nice plots for inferences.\nGetting Streaming History Spotify allows you to download all your data through its accounts webpage. First, you need to request for all your data. This step will take something between a day or a fortnight depending how much you listen. For me, it took two working days and I got two files on streaming. Again, for heavy listeners, there will be more files.\nLet\u0026rsquo;s Play Some Music! First, we will load the streaming history files in R. If there are multiple files, read them all one by one like I\u0026rsquo;ve done here. Finally, combine all of them together. Don\u0026rsquo;t forget to remove the ones you don\u0026rsquo;t need \u0026ndash; it takes a lot of run-time memory.\nhistory1 = fromJSON(\u0026#34;StreamingHistory0.json\u0026#34;, flatten = TRUE) history2 = fromJSON(\u0026#34;StreamingHistory1.json\u0026#34;, flatten = TRUE) st = rbind(history1,history2) rm(history1,history2) # what does the dataframe contain. head(st) endTime artistName trackName msPlayed 1 2019-07-26 03:23 A.R. Rahman Ok Jaanu Title Track 206250 2 2019-07-26 03:26 A.R. Rahman Enna Sona 213632 3 2019-07-26 03:27 A.R. Rahman Jee Lein 42292 4 2019-07-26 03:39 Vishal Dadlani Adhoore 104396 5 2019-07-26 09:25 Karthik Behene De 11719 6 2019-07-26 13:03 Vishal Dadlani Swag Se Swagat 235944 Artists To find out which artists I listen to most, let me make a visualisation for the same.\nst %\u0026gt;% count(artistName, sort = TRUE) %\u0026gt;% top_n(15) %\u0026gt;% mutate(artistName = reorder(artistName, n)) %\u0026gt;% ggplot(aes(x = artistName, y = n)) + geom_bar(aes(fill=n), stat=\u0026#34;identity\u0026#34;) + scale_fill_distiller(palette=\u0026#34;Spectral\u0026#34;) + xlab(NULL) + coord_flip() + labs(x = \u0026#34;Artist\u0026#34;, title = \u0026#34;Artists I listened most to\u0026#34;, fill = \u0026#34;Count\u0026#34;) + theme_minimal() count(artistName, sort = TRUE) is going to count all the artists as their total count is going to tell me how many I listen to. top_n(15) is going to give me top 15 entries. mutate(artistName = reorder(artistName, n)) will reorder the tibble, sorted by their counts. geom_bar() is used for barplots, scale_fill_distiller() is used for colourful and continuous scale. Rest functions are for cosmetic reasons. As you can observe, I listened most to Arijit Singh, Pritam and AR Rahman. The top non-Indian artist here is Post Malone and Billie Ellish \u0026ndash; and only two in top 15. This is not surprising as I mostly like Bollywood artists as they mix classical and pop really well.\nTracks Again, I will make a visualisation and use it for inference. The functions and explanations are the same as in artists.\nst %\u0026gt;% count(trackName, sort = TRUE) %\u0026gt;% top_n(15) %\u0026gt;% mutate(trackName = reorder(trackName, n)) %\u0026gt;% ggplot(aes(x = trackName, y = n)) + geom_bar(aes(fill=n), stat=\u0026#34;identity\u0026#34;) + scale_fill_distiller(palette=\u0026#34;Spectral\u0026#34;) + xlab(NULL) + coord_flip() + labs(y = \u0026#34;Count\u0026#34;, title = \u0026#34;Tracks I listened most to\u0026#34;, fill = \u0026#34;Count\u0026#34;) + theme_minimal() In contrast to my top artists, far more non-bollywood songs feature in my top tracks. In fact, three out of top-five tracks are non-Indian.\nIt is probably because I mostly listen to bollywood and a select few from other genres; but when I do listen other genres, I quickly grow to love them.\nTracks by Artists Considering my listening history of artists and tracks give two different pictures, let me see which songs do I really listen to by my top artists.\nOne line needs to be added for selecting the artists: filter(artistName == \u0026quot;KK\u0026quot;) before count() and after st.\n# by KK st %\u0026gt;% filter(artistName == \u0026#34;KK\u0026#34;) %\u0026gt;% count(trackName, sort = TRUE) %\u0026gt;% top_n(15) %\u0026gt;% mutate(trackName = reorder(trackName, n)) %\u0026gt;% ggplot(aes(x = trackName, y = n)) + geom_bar(aes(fill=n), stat=\u0026#34;identity\u0026#34;) + scale_fill_distiller(palette=\u0026#34;Spectral\u0026#34;) + xlab(NULL) + coord_flip() + labs(y = \u0026#34;Count\u0026#34;, title = \u0026#34;Top tracks by KK\u0026#34;, fill = \u0026#34;Count\u0026#34;) + theme_minimal() Time of Listening Finally, I want to what time of the day I listen to most. Now, I could’ve made a barplot but then I remembered of a neat “clock plot” I saw once on Reddit (r/dataisbeautiful). I wanted to replicate it with my own Spotify history.\nFirst, I will take end times in a separate data frame for further use. I’ll reformat it in date-time format using as.POSIXct(), and change the time zone to India’s Asia/Kolkata.\ndf = as.data.frame(st$endTime) colnames(df) = \u0026#34;endTime\u0026#34; df$endTime = as.POSIXct(df$endTime,format=\u0026#34;%Y-%m-%d %H:%M\u0026#34;,tz=\u0026#34;GMT\u0026#34;) attributes(df$endTime)$tzone = \u0026#34;Asia/Kolkata\u0026#34; Then, I’ll add additional columns in the dataframe for hour, minute and seconds of time when I finished listening.\ndf$period = format(df$endTime, \u0026#34;%p\u0026#34;) df$hour = format(df$endTime, \u0026#34;%I\u0026#34;) df$month = format(df$endTime, \u0026#34;%b\u0026#34;) Then, I’ll modify the data frame and make it into a count-table by hours.\ndf_at = df %\u0026gt;% add_count(period,hour) %\u0026gt;% distinct(period, hour, n) Finally, I’ll make the clock plot. I use coord_polar() to convert into polar coordinates. The theme() is used for cosmetic reasons.\nggplot(df_at, aes(x = as.factor(hour), y = n, fill = period)) + geom_bar(stat = \u0026#34;identity\u0026#34;, position = \u0026#34;dodge\u0026#34;) + coord_polar(theta = \u0026#34;x\u0026#34;, start = 0.26) + xlab(\u0026#34;\u0026#34;) + ylab(\u0026#34;\u0026#34;) + theme(axis.ticks = element_blank(), axis.text.y = element_blank(), panel.background = element_blank(), panel.grid.major.x = element_line(colour=\u0026#34;grey\u0026#34;), axis.text.x = element_text(size = 25), legend.title=element_blank()) + labs(title = \u0026#34;When do I listen to music?\u0026#34;) I listen most in the morning between 8 AM and 12 PM. As the day proceeds, my listening also falls, virtually stopping in the afternoon. It picks up again in the evening and stays at a relatively constant level till around 10 PM.\nDid you like this small exploration? Why don\u0026rsquo;t you try your own and see what you find for your own music! Please feel free to contact me if you encounter any errors. My email: hello@harsh17.in.\n","permalink":"/exploring-my-spotify-listening/","summary":"I analyse my listening stats and patterns using R.","title":"Exploring My Spotify Listening"},{"content":"It is not very uncommon — if not often — that non-profits acquire sums of money beyond what their operations need. Their calling to a cause and popularity rewards them handsomely. In their pursuit of generating enough funds for their cause, they often bite more than they can chew. Consider the corpus — called by different names in different countries — of Greenpeace. Their fund balance is 38,316,000 euros with net income of 1,413,000 euros, or 3.7%. In a general sense, it means that they’re playing with very small money in comparison with their pocket size.\nI got three questions regarding these non-profits\u0026rsquo; finance philosophies:\nSince donations are one-way and donors hardly compare donations with their other investments (if I can even call donations as investments), I believe non-profits\u0026rsquo; cost of capital (what you pay one extra rupee you need) would be zero.\nIf so, would they take any and every project that “sounds good” to them, without much in-depth financial analysis? Won\u0026rsquo;t this lead to choosing wrong projects?\nCorporations have an earnings redistribution decision to make: how they plan to return back the money to their investors. As far as I know, there are no such decisions for them. So, if we have a non-profit that has a huge corpus and keeps on adding 2-5% of it every year and can\u0026rsquo;t find a suitable venue to invest in, what should they do?\nI pondered over these questions. The more I thought about them, the more they confused me. So, I approached my finance professor, Prof Radha Ladkani. She first explained me related terms and concepts. Then we mulled over and finally arrived at following perorations:\nFor non-profits, returns are not always tangible. For example, an organisation working on preventing farmer suicides would judge its efficacy by how many suicides it prevents rather that economical return on investment which could possibly be negative!\nFinancially, it makes sense to invest in all projects that generate positive return greater than expected inflation. But would an organisation do that? Never. Why? Because it\u0026rsquo;s a qualitative judgement and every non-profits’ investment has to be justified to its donors — or it loses them forever.\nI was trying to understand why my institute deemed it fit to replace the one set gleam tiles with another set — an illogical investment in my opinion — instead of controlling the skyrocketing tuition fees. On a philosophical level, I supported self-sufficiency of institutes but how much of forced donations was enough? Professor suggested me this money were to be used for research funding as supporting needy students, in future, the way western institutes do it. I can’t really accept it. Unlike western institutes that corpus who do support needy students with their tuition, IIM Indore doesn’t. It does unwillingly try — through NBFA, etc. that reduce 1% of tuition fee for 0.1% students — but that’s it. And research? Well, I don’t know about faculty research incentives but it definitely doesn’t support student research.\nIs our qualitative judgement of investment so bad that we value adding some plants to the garden more than pushing our academic ethos? I’ll leave it to the reader to decide.\n","permalink":"/understanding-profits-of-non-profits/","summary":"Non-profits do end up with more capital than they need. What kind of earnings redistribution system do they need?","title":"Understanding Profits of Non-profits"},{"content":"When a new technology makes it to the headlines, it is vital to understand which of those are commercially viable and which are futile attempts of modernisation. Gartner hype cycle helps understand the evolution of technology and its use cases. Every year, Gartner releases multiple hype cycles – each focusing on some new technology. The hype cycle for Artificial Intelligence is one such tool (others being on emerging technology and marketing).\nThere are five stages in the hype cycle progression – innovation trigger, peak of inflated expectations, the trough of disillusionment, the slope of enlightenment, plateau of productivity.\nInnovation trigger technologies are technologies that haven’t evolved themselves beyond concepts. There aren’t any direct use cases, but the technology seems promising. The peak of Inflated Expectations occurs after innovation is triggered. Now that the technology is highly marketed, everyone has expectations – which vary from what the technology can perform. There are hundreds of success stories, accompanied by thousands of failure stories.\nThe trough of disillusionment occurs when people’s interest in technology wanes. People realise that these technologies just had great fanfare and aren’t that useful. Only the players with deep pockets and endurance continue with technology. The slope of enlightenment is when the technology’s use cases and details start taking shape into definite forms. Players slowly realise the cost-benefits of adapting and accommodating the technology.\nFinally, the plateau of productivity occurs when the technology becomes mainstream, generally accepted by businesses and taken for granted.\nConsider computer (personal use home computers, to be technically precise). In 1969, when Honeywell Kitchen Computer was launched, it started the home computing era. But sadly, none were sold – thanks to its high price. But that rolled a movement of what a computing device could do. Everyone could sense it would make the difference, but no one had crystallised their vision of how. That could be termed as a computer’s innovation trigger. It was then marketed as a personal finance device, automation device, and so on. It was expected to reduce costs manifold – marking its entry to the peak of inflated expectations. None of that was practically possible.\nData entry for commercial purposes was cumbersome; home automation would keep the device always on, pushing up electricity bills and heating it up. It was realised that all the marketing might not be accurate, and it’s all disillusionment. However, in less than a decade (1977), Apple II was launched. This changed the perception as well as the reception. People began realising that many processes were possible and software like Lotus brought it mainstream. So, it joined the slope of enlightenment. Soon they entered the plateau of productivity. Rest, as they say, is history.\nBut then, there is a natural question: how to use the Hype Cycle to our advantage? Well, typically, industry players see how they can leverage the technology available at their disposal – and the right time for it. Different industries are differently affected by any new technology. Moreover, the players in the industry have different risk appetites.\nA risk-taking player is ready to reap the benefits of being an early adopter. They will adopt technology in its innovation-trigger phase. A manager who intensely invests in cost-benefit analyses would probably wait for the peak of inflated expectations and trough of disillusionment. A start-up would likely work with an inflated expectations technology. In contrast, an established firm would probably invest during the trough of disillusionment. The slope of enlightenment is when every firm would adopt the technology if they can afford to. When it reaches the plateau of productivity, it will likely be mainstream and not using it could mean a competitive disadvantage.\nAccording to Gartner’s 2019 survey of 3000 CIOs, more enterprises are entering the third era of IT infrastructure. A majority of the expenditure was targeted towards increasing customer engagement via digital channels, making “Digital Business” a driving concern. The survey found that 33%companies evolved their digital endeavours, up from 17% the previous year. The CIOs are “making a leap from IT-as-a-craft to IT-as-an-industrial-concern”, says Andy Rowsell-Jones, VP, Gartner.\nWith a wide variety of digital interactions available through digital channels, organisations must expand beyond traditional engagements. This is further supported by reduced cost and better consumer experience for the organisations. The spending by the corporate houses also remains relatively steady. CIOs, on average, expected that their IT budget will grow by 2.9%, which is slightly less than 2018’s 3% growth rate.\nThis Gartner Hype Cycle highlights how AI is reaching organizations in many different ways.\nExample of technologies in each phase AI Marketplace (Innovation Trigger) AI marketplace is an online marketplace to buy and sell technology, Artificial Intelligence and Machine Learning based algorithms online for anybody to use. A typical AI marketplace has algorithms spanning technologies like text mining, computer vision, and speech recognition, among others. These marketplaces intend to link AI services with each other, thus increasing the supply of AI-based technology for general use. Bundling of services at the same platform (which is typically online) costs way lesser than traditional AI deployment on case to case basis.\nThese marketplaces can be open source as well as proprietary. For example, GenesisAI is open source, whereas Infosys provides the same as proprietary software. These developments are mostly on the paper; however, entry of Infosys is an exciting episode in its progress.\nIt is difficult to predict to what extent will the AI marketplace would grow. But considering the expected global growth of AI market to $202 billion by 2026 (CAGR 33%), it is destined to grow beyond expectations – typical of innovation trigger technologies.\nChatbots (Peak of Inflated Expectations) Chatbots are the basis of computer-human interactions in the present world. These interactions are mostly text-based, however, sometimes these can be voice-based. Earliest chatbots (deployed in customer service to answer mundane questions) were a simple decision based (or exaggerated flowcharts). However, newer chatbots have intelligent engines working behind them that try to understand individual user demands.\nThere are success stories as well as failure stories, typical of technology in this phase. Facebook, for example, allows business listed on it to use basic chatbots for free, which has significantly reduced the burden of some critical business. Google’s Duplexis the latest example of how chatbots have developed in recent years and their hopeful influence in future.\nChatbots are growing – and fast. According to Market Research Engine Report on Chatbot Market the global chatbot market is expected to explode in coming years. It’ll exceed more than $994 million by 2024, growing at a CAGR of 27%. Banks are further likely to automate their customer interaction by over 90%.\nIn the US alone, users of smart speakers and assistants like Amazon Alexa, Google Assistant and Siri grew by 40% to reach 66.4 million users in 2018. The expected rise of smart speakers, this is going to improve further.\nVirtual Assistants (Trough of Disillusionment) Virtual Assistants are self-employed people and provide professional administrative, technical or creative assistance remotely from home offices. They are usually independent contractors and are typically employed as a freelancer, therefore cheaper than traditional assistants. Their typical job includes managing their client’s emails, travel schedule, etc. A few years earlier, they were thought to be transformational in personal assistance. However, these jobs are slowly losing their hype of being the game-changer. Zirtual and VaVa Virtual Assistant are popular virtual assistant portals.\nIn the US, there was a sudden boom in the hiring of Virtual Assistants. According to an infographic by Valuewalk. the number of people identifying themselves as virtual workers increased by 79.7%. Majority of it was fuelled by low operational cost, higher productivity and ability to telecommute to work. The supply side is filled by freelancers, most of which are college students. In my personal experience in Latvia, I saw a big group of the student taking this up as their summer job.\nThere are no current technologies in the slope of enlightenment phase (see Figure 2). Most technologies end up being till trough of disillusionment phase with two being in the plateau of productivity phase. This could be because of two reasons.\nFirst, AI in itself is a new buzzword. Most technologies haven’t figured out their business uses. The experiments till now have been huge investments and with not many scalable economy-wide or industry-wide applications. Second, those technologies which reached the plateau are not new. Speech recognition and GPU based acceleration have been worked on at least for a decade. Further, the only technology which is expected to reach the plateau in less than two years is Robotic Process Automation which has been worked on for quite some time now. All other technologies take at least two years to be exploited further.\nSpeech Recognition (Plateau of Productivity) It is a field of computational linguistics that develops algorithms that convert text to speech aided by computers, and understand speech by humans, turning them to text understandable by machines. The most basic of speech recognition have voice-based interaction systems.\nThese technologies were thought to be a game-changer but changed with time. These are recognised to have strong use cases today. Examples include Google Assistant, Cortona, etc.\nThe first steps to speech recognition were undertaken in 1952 by Bell Labs when their machine Audrey could recognise digits 0 to 9 with 90% accuracy (when done by inventor), or 70-80% for other users. However, the development shot and today’s computers are able to recognise millions of words with almost no difficulty.\nA version of this article was submitted as assignment in Prof Saurabh Kumar\u0026rsquo;s course Information System for Managers at IIM Indore.\n","permalink":"/gartner-hype-cycle/","summary":"Technologies come and go. The real question is can you predict its popularity?","title":"Gartner Hype Cycle"},{"content":"Apple, as you very well know, is facing falling iPhone sales - for various reasons. One primary cause is the increased life of phones and long upgrade cycle. So, to maintain similar levels of profit, they either have to increase the margins; perhaps the reason for remarkably high prices for the current line. Or, they can learn a lesson on global expansion from the cigarettes industry.\nI see quite a similarity between now and 90s for Apple. Apple is trying new things like then - which in itself is a good thing - but to make the same profits, it\u0026rsquo;s massively milking its existing consumer base. The current market in the US and Europe is quite saturated as almost everyone already owns iPhones. To enable growth, they need to find more people who buy their products.\nWhat I don\u0026rsquo;t understand is the ignorance of India. India is the second largest global market. After China, which Apple has a quite colossal presence in, India should be a natural target. But they aren\u0026rsquo;t just not focusing on acquiring new consumers but also have low-key efforts for service of existing consumers.\nTo a certain extent, they\u0026rsquo;re right: India has a vast population whose yearly income is less than that of an iPhone. But that\u0026rsquo;s not all; there are still many people who would like to get an iPhone and can afford it - the top 25%. There is a reason why OnePlus\u0026rsquo;s biggest market is India.\nIt\u0026rsquo;s not that Indians don\u0026rsquo;t like iPhones - they love it. Recently Apple started manufacturing 6s in India, and it is the new hot cake. People are buying four-year-old iPhone!\nTo Apple\u0026rsquo;s defence, they can blame the high tariffs for such high prices and internal regulations for not having any service centres. But that\u0026rsquo;s not enough considering OnePlus or Google have their service centres here, up and running.\nAbout the high prices: I think Apple can reduce the margin for a while. Even if it can\u0026rsquo;t do that, it could use some form of Apple financing. Most people who can afford Rs 40,000 right now can afford to pay another Rs 20,000 over the next two years. But before all that, it should focus on improving the services here. There are no official Apple service centres in the country.\n","permalink":"/iphones-in-india/","summary":"Why is Apple ignoring such a big market like India?","title":"iPhones in India"},{"content":"I’ve spent some time in European museums. I’ve visited Riga, Vilnius, Warsaw, Oslo, Bergen, Prague, Berlin and Munich. They are immensely packed with a variety of artefacts and a pretty good amount of collected history. Each of them has some content – very niche but does have content.\nHowever, I felt a severe lack of museums about pre-war history (excluding art museums). Most museums are relics of Nazi suppression and Soviet occupation. The museums focus far too much on contemporary history. For example, I read that 16th century Riga was a major port and local Baltic hub. Or about Germanic tribes. I couldn’t find museums that tell me more about them. I’m not saying that these don’t exist – because they do. However, they’re not emphasised enough.\nMore importantly, I missed thematic museums. There are topical museums, but not thematic. Thematic museums mean museums surrounding ideas. Like on “flea market”, “mythology”. These would sample artefacts not limited by geography or period. These would be expansive. There are enough topical museums – WWII Jews, concentration camps, Munich Science Museum, etc.\nGiven that you can get the most information on the internet these days, another better idea would be to establish “cultural parks”. A place where you can go to experience another culture – their local settings, food, mythologies, living style, etc. If there is one trend on the rise in tourism, it’s experience tourism. More youth travel today to experience the local culture than to visit churches/cathedrals. This will increase further over the years.\nThis idea of “cultural parks” is not something new. I remember reading Helen Keller’s visit to World Fair when she was a child. She was fond of Lord Ganesha (elephant-faced Hindu god). These days World Fairs are just about business card exchanges. I can just imagine different facts the world would be surprised to find about India/Japan/Ireland/Korea/etc.\nYou enter a place. Visit mini Japan. You’re offered Sencha, Mancha, Hojicha, and many other tea options. You are served in a Japanese hut, joined by a young woman in kimono telling you Shinto stories. The garden is decorated with Ikebana. Next, you go to mini Uzbekistan. Have some horse riding. You eat plov and manti in a restraint where shashmak songs play in the background. Next, India. Next, Mexico. Next, Ghana. And so on. Won’t it be wonderful? I would definitely sign up for membership. They’ll offer far too many attractions to cover in a day.\nWe need cultural parks—a collection of many cultures in one place. With rising western influence/Americanisation, this is required more than ever. If not for just education, then at least for culture preservation. Moreover, this park should have global branches. Like Madame Tussauds does with wax statues.\nMuseums are old fashioned. Wikipedia has already told me parts of history. I need to experience the present and the dying cultures across the world. I need cultural parks.\n","permalink":"/museums-and-experience-tourism/","summary":"Museums are a thing of past. Google has more over-the-top information than I need. Why not target for the experience instead of expertise?","title":"Museums and Experience Tourism"},{"content":"At times in your life, you will arrive at a point when you need to decide between the two equally good sounding options. You will feel one of the choices is \u0026ldquo;right\u0026rdquo;. Other isn\u0026rsquo;t. One of the picks will result in instant gratification. The other will give you long term benefits. Arguments exist for both sides, but you, probably like me, remain confused.\nI have thought over this question for a long time. It keeps biting me. I arrived at some conclusion; you\u0026rsquo;re welcome to question it. My ascertainment and the rest of this article is based on anecdotes and some limited literature.\nProf Biswanath Swain\u0026rsquo;s philosophy class was my first introduction to these complicated concepts. He started the course by noting the difference between law and ethics. If you were to understand them with Venn diagrams, laws would lie inside the circle of ethics. Therefore, even if something isn\u0026rsquo;t explicitly prohibited, it\u0026rsquo;s not the ethical thing to do.\nWhat is legal only constitutes a proportion of what\u0026rsquo;s ethical. This rule isn\u0026rsquo;t complete. Some unethical things are legal, unfortunately.\nNext, he taught us about the various ethical perspectives. Law of Means. Aristotle. Egoistic Approach. More like them. However, utilitarianism stood out to me. It was a bit different from the others. It understood that decisions would lead to some people\u0026rsquo;s happiness and others\u0026rsquo; suppression of interests. It suggested looking for \u0026ldquo;maximum happiness of maximum number\u0026rdquo;. Another argument attacked it, saying this logic of utilitarianism could justify genocides. That argument is overblown, but I got the idea. This theory could not give me exactly correct answer to what is right.\nFew years later, I took a business ethics live-workshop class by Profs G Venkat Raman, Swapnil Garg and Sneha Thaplial.1 We discussed Volkswagen dieselgate fiasco, which was unfolding right before our eyes. Before taking that course, I had concluded that Volkswagen\u0026rsquo;s managers were blamed for the debacle. However, after reading the case materials, I realised it was more than that. They were not the only ones who had wronged.\nThe most striking point was when the company asked the customers to bring back the car so that Volkswagen could plug out the questionable diesel device. The consumers refused to bring the cars back. They didn\u0026rsquo;t want low mileage cars \u0026ndash; who cares about pollution more than the government itself!\nI read about Martin Winterkorn2 and other executives\u0026rsquo; opinions. They weren\u0026rsquo;t grossly wrong \u0026ndash; car companies need to sell their cars. Regulations about air pollution are stringent, and consumers demand high mileage at a low price on extreme ends. They broke the law by hiding that the cars were polluting in the real world. However, it didn\u0026rsquo;t affect them a lot as a company. The sales barely dropped in the US and remained more or less the same elsewhere.\nThen, I also watched the movie and read the case of Enron. A plethora of things that went wrong. Before the stock price toppled, they were sky high \u0026ndash; an excellent point for shareholders. \u0026ldquo;The management\u0026rsquo;s duty is to its shareholders\u0026rdquo;, according to Milton Friedman. Managers are supposed to make the checklist that enhances shareholder values and wealth.\nI am a management student, so I do picture myself in such a situation someday. What would I do? More importantly, what should I do?\nTo prepare for that day, I decided to build on a framework to determine the ethics based on my limited understanding. At the very least, I won\u0026rsquo;t regret choosing an apple over an orange twenty years down the line. I borrowed a large part of it from Bhagwat Gita, an excellent life book that first my mother suggested I read, and always kept a copy with me ever since.\nFirst, I should decide what my karma is \u0026mdash; the purpose.3 The Matrix (movie series) showed that even a program has its karma and purpose. I am here for a specific task which I should pursue at all costs. A CEO must increase shareholder value. A family guardian must take care of family members. A police officer must guard the city and its residents. Yama must let people die when it\u0026rsquo;s their time.4 It\u0026rsquo;s their sacred duty.\nThen, I should understand that my stay here is forever. It is a long journey that I must undertake. Even I want to reference immortal atman (soul) being reborn here, but that would make it very dramatic.\nThese two rules (karma and immortality) would lead you to a definite answer in most cases. Right or not, that\u0026rsquo;s another debate \u0026ndash; who decides what\u0026rsquo;s right anyway \u0026ndash; but at least you won\u0026rsquo;t have regrets.\nKeep in mind two things: your duty and the axiom everything is a part of the journey. It will help you decide what is right: when you have to choose between the options, X and Y, choose the one which is your duty to, not forgetting the long-term impact of the decision.\nTo exemplify, consider the case of Volkswagen. Winterkorn and other executives were supposed to do what would give them the highest returns for their shareholders. Then the question is, what did they do wrong? They undertook something illegal, something which would not enhance shareholder value in the long term. And as I stated earlier, shareholder values not only consist of wealth but are more than that. They did something illegal in their business \u0026ndash; their karma. So, they failed in their karma.\nBut what happened with Enron? They didn\u0026rsquo;t do something illegal in their business per se (at least in their mind). Their mistake was that they didn\u0026rsquo;t think about the impact in the long term. They increased shareholder values for a short period. Still, they harmed the company in the long run, knowing beforehand that bloating stock prices isn\u0026rsquo;t a lasting solution. That\u0026rsquo;s a violation of the second rule that businesses last long.\nThese two rules (karma and immortality) would lead you to a definite answer in most cases. Right or not, that\u0026rsquo;s another debate \u0026ndash; who decides what\u0026rsquo;s right anyway \u0026ndash; but at least you won\u0026rsquo;t have regrets.\nThanks for stopping by my digital garden. If you\u0026rsquo;re visiting for the first time, here\u0026rsquo;s a quick guide. You can find my academic publications, fun projects I've tried, stuff I'd like people to care about, some random musings and notes on my favourite things.\nIf anything looks useful to you, please tell me.\nVenkat Raman, G., Garg, S., \u0026amp; Thapliyal, S. (2019). Integrative live case: A contemporary business ethics pedagogy. Journal of Business Ethics, 155(4), 1009-1032. PDF.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nMartin Winterkorn was CEO of of Volkswagen AG, the parent company of the Volkswagen Group, chairman of the supervisory board of Audi, and chairman of the board of management of Porsche Automobil Holding SE. Winterkorn was criminally indicted over the emissions cheating scandal in the United States on 3 May 2018 on charges of fraud and conspiracy. In April 2019 he was criminally indicted on charges of fraud in Germany. He is currently a fugitive of justice in the United States, and is wanted by the Environmental Protection Agency for Conspiracy to Defraud the United States, Conspiracy to commit wire fraud, Conspiracy to violate the Clean Air Act, and three counts of Wire Fraud. Source: Wikipedia.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nMany people, especially westerners, think of karma as the principle of cause and effect. That is somewhat true. Karma in Sanskrit stands for duty. It is only an obvious corollary that your actions determine the results. That doesn\u0026rsquo;t mean only your actions determine the results, there are more factors at play. Hinduism says rebirth as a human occurs every 8,400,000 births and that includes being reborn as a different species. I probably don\u0026rsquo;t believe in rebirths; certainly not that my actions in human life are going to determine my happiness as an amoeba in next one.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nYama is Hindu god of death, dharma (righteousness) and the world of dead. Wikipedia.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"/how-to-decide-to-do-or-not-to-do/","summary":"Thinking about the most crucial dilemma of all time.","title":"How to decide: to do, or not to do?"},{"content":"Linux Ubuntu I kept on installing and uninstalling Ubuntu, until I shifted to Mac. This list will help me keep my mind clear of small tricks involving Terminal, if I move again.\nUse tls for battery preservation. Use gdebi for Debian based installs. Install Dropbox, R, RStudio, TeXLive and Spotify. You\u0026rsquo;ll need them. ","permalink":"/notes-on-linux/","summary":"sudo do not forget","title":"Notes on Linux"},{"content":"My first academic publication: a peer-reviewed book chapter on statistical modelling using Gaussian processes. We reviewed several GP models and correlation structures, and methods to handle numerical instabilities due to near-singular matrices. Finally, we reviewed several algorithms developed specifically for analysing big data obtained from computer simulators.\nAbstract Over the last two decades, the science has come a long way from relying on only physical experiments and observations to experimentation using computer simulators. This chapter focuses on the modelling and analysis of data arising from computer simulators. It turns out that traditional statistical metamodels are often not very useful for analyzing such datasets. For deterministic computer simulators, the realizations of Gaussian Process (GP) models are commonly used for fitting a surrogate statistical metamodel of the simulator output. The chapter starts with a quick review of the standard GP based statistical surrogate model. The chapter also emphasizes on the numerical instability due to near-singularity of the spatial correlation structure in the GP model fitting process. The authors also present a few generalizations of the GP model, reviews methods and algorithms specifically developed for analyzing big data obtained from computer model runs, and reviews the popular analysis goals of such computer experiments. A few real-life computer simulators are also briefly outlined here.\nCitation Harshvardhan, M., \u0026amp; Ranjan, P. (2019). \u0026ldquo;Statistical Modelling and Analysis of the Computer-Simulated Datasets\u0026rdquo;. In B. Gupta, \u0026amp; D. Agrawal (Eds.), Handbook of Research on Cloud Computing and Big Data Applications in IoT (pp. 202-228). Hershey, PA: IGI Global. doi:10.4018/978-1-5225-8407-0.ch011 [arXiv:2012.11122]\nLinks PDF: https://www.harsh17.in/docs/simulation_2019.pdf arXiv: https://arxiv.org/abs/2012.11122 Original link: https://www.igi-global.com/chapter/statistical-modelling-and-analysis-of-the-computer-simulated-datasets/225418 ","permalink":"/statistical-modelling-and-analysis-of-the-computer-simulated-datasets/","summary":"My first academic publication: a peer-reviewed book chapter on statistical modelling using Gaussian processes. We reviewed several GP models and correlation structures, and methods to handle numerical instabilities due to near-singular matrices. Finally, we reviewed several algorithms developed specifically for analysing big data obtained from computer simulators. \u003ca href=\"https://www.harsh17.in/docs/simulation_2019.pdf\"\u003e🔗 PDF\u003c/a\u003e","title":"Statistical Modelling and Analysis of the Computer-Simulated Datasets"},{"content":"I gave a presentation on LaTeX on 23rd November 2017 at IIM Indore – my home college. I claim to be no expert on the subject as this presentation was aimed at just kindling the spark amongst the students for them to start experimenting with it, and in more general sense, to encourage them to learn skills off the books.\nI learned LaTeX last summer during my internship project with Prof Pritam Ranjan, IIM Indore. In the report, I had to use a lot of matrices, determinants, and fractions, with many subscripts and superscripts. It was too complicated for me to find the exact symbol, then put variables in order, etc. Even when I got them, I needed to align them separately! Also, it was too complicated to enter a matrix of dimension more than 3×3. What to do? LaTeX was his answer.\nI started learning it. Some YouTube videos, some extra cups of coffee and snacks and voila! I was getting the hang of it. Using LaTeX saved me a lot of time and of course, increased my productivity. If he wouldn’t have given me a chance and the idea, I too would be just another person muddling with Word to get a document straight. So yeah, he is the person to whom I dedicate all this. Heartfelt and sincere thanks, sir!\nAmongst various fantastic takeaways from this internship, learning LaTeX was a unique one. Lately, I also realized that LaTeX could not only compliment my regular Microsoft Word usage but could even replace it entirely. These days, I have started making presentations, CV and Resume too on LaTeX – all for a simple reason – it is smooth and beautiful.\nNow, I am quite fanatic of LaTeX and if the document size exceeds a page, LaTeX is the resource to go for. A small pun: Nawaz Sharif was recently ousted from his post (see this). The primary evidence against him was the use of Calibri font of Microsoft Word (see this).\n“However, the declaration, dated February 2006, was typed in the Calibri font, which was not introduced until 2007 – raising suspicions that the document may have been forged.” – Independent, UK. So see, Microsoft changed their default font, and Nawaz Sharif got convicted. Had he been using LaTeX, he would be happily enjoying his tenure (and perhaps vacations)!\nHere are the slides for your reference. You may drop in your questions, and I would love to help!\n","permalink":"/presentation-on-latex/","summary":"My workshop on using LaTeX for IIM Indore community","title":"Presentation on LaTeX"},{"content":"This workshop was organised by Syntax, IT Club of IIM Indore. I explained how LaTeX and Overleaf worked to undergraduate management students. While other templates were briefly discussed, the workshop used Deedy CV for demonstrating LaTeX-based editing. Most participants had no prior exposure to LaTeX and thus I concentrated in building the foundations and then jumping to Resume. By the end of workshop, every participant had a Resume built for themselves.\n","permalink":"/using-latex-for-resume-and-assignments/","summary":"Hands-on workshop on using LaTeX for Resumes","title":"Using LaTeX for Resume and Assignments"},{"content":"I gave a presentation on LaTeX on 23rd November 2017 at IIM Indore \u0026ndash; my home college. I claim to be no expert on the subject as this presentation was aimed at just kindling the spark amongst the students for them to start experimenting with it, and in more general sense, to encourage them to learn skills off the books.\nI learned LaTeX last summer during my internship project with Prof Pritam Ranjan, IIM Indore. In the report, I had to use a lot of matrices, determinants, and fractions, with many subscripts and superscripts. It was too complicated for me to find the exact symbol, then put variables in order, etc. Even when I got them, I needed to align them separately! Also, it was too complicated to enter a matrix of dimension more than 3×3. What to do? LaTeX was his answer.\nI started learning it. Some YouTube videos, some extra cups of coffee and snacks and voila! I was getting the hang of it. Using LaTeX saved me a lot of time and of course, increased my productivity. If he wouldn\u0026rsquo;t have given me a chance and the idea, I too would be just another person muddling with Word to get a document straight. So yeah, he is the person to whom I dedicate all this. Heartfelt and sincere thanks, sir!\nAmongst various fantastic takeaways from this internship, learning LaTeX was a unique one. Lately, I also realized that LaTeX could not only compliment my regular Microsoft Word usage but could even replace it entirely. These days, I have started making presentations, CV and Resume too on LaTeX \u0026ndash; all for a simple reason \u0026ndash; it is smooth and beautiful.\nNow, I am quite a fanatic for LaTeX and if the document size exceeds a page, LaTeX is the resource to go for. A small pun: Nawaz Sharif was recently ousted from his post (see this). The primary evidence against him was the use of Calibri font of Microsoft Word (see this).\n\u0026ldquo;However, the declaration, dated February 2006, was typed in the Calibri font, which was not introduced until 2007 \u0026ndash; raising suspicions that the document may have been forged.\u0026rdquo; \u0026ndash; Independent, UK.\nSo see, Microsoft changed their default font, and Nawaz Sharif got convicted. Had he been using LaTeX, he would be happily enjoying his tenure (and perhaps vacations)!\n","permalink":"/introduction-to-latex/","summary":"LaTeX workshop for first-time users","title":"Introduction to LaTeX"},{"content":" Created bilingual audio-video content for classes 6th to 8th for Science and English by digitising NCERT textbooks Created, managed and maintained program\u0026rsquo;s Hindi website using WordPress for four months ","permalink":"/smart-villages-indore/","summary":"Creating and deploying web-based content for the Government of Madhya Pradesh","title":"Smart Villages Indore"},{"content":" I\u0026rsquo;m an Assistant Professor of Information Systems and Analytics at the American University of Sharjah in the United Arab Emirates. My research focuses on forecasting, machine learning, and applied analytics. I attained my PhD in Business Analytics at the Haslam College of Business, the University of Tennessee, advised by Dr. Chuanren Liu. My dissertation explored enterprise demand forecasting using scalable machine learning algorithms. Alongside my PhD, I spent over a year interning with HP Inc. in their SPaM team, working on supply chain and demand forecasting projects.\nMy academic journey began with a BA and MBA from the Indian Institute of Management, Indore, where I was advised by Dr. Pritam Ranjan, graduating in April 2021. During this time, I had the opportunity to be an ERASMUS+ scholar at the University of Latvia for a semester. Sainik School Tilaiya shaped much of who I am today.\nWhen I\u0026rsquo;m not neck-deep in data, you\u0026rsquo;ll often find me writing or lost in the wonderful world of books. For an adrenaline rush, I love participating in adventure sports. And nothing lifts my spirits like a good Calvin and Hobbes comic! Music is my constant companion, thanks to Spotify. You might enjoy my curation of Classical Hindi Music.\nCurriculum Vitae · Resume · Google Scholar · hello@harsh17.in\n","permalink":"/about/","summary":"\u003cimg src=\"/img/avatar.png\" alt=\"Harshvardhan\" style=\"float:right;width:180px;margin:0 0 1.5rem 2rem;border-radius:50%;\" /\u003e\n\u003cp\u003eI\u0026rsquo;m an Assistant Professor of Information Systems and Analytics at the \u003ca href=\"https://www.aus.edu/\"\u003eAmerican University of Sharjah\u003c/a\u003e in the United Arab Emirates. My research focuses on forecasting, machine learning, and applied analytics. I attained my PhD in Business Analytics at the Haslam College of Business, \u003ca href=\"https://haslam.utk.edu/business-analytics-statistics\"\u003ethe University of Tennessee\u003c/a\u003e, advised by \u003ca href=\"https://datamining.utk.edu/\"\u003eDr. Chuanren Liu\u003c/a\u003e. My \u003ca href=\"/phd/\"\u003edissertation\u003c/a\u003e explored enterprise demand forecasting using scalable machine learning algorithms. Alongside my PhD, I spent over a year interning with \u003ca href=\"/hp-blog-2023/\"\u003eHP Inc.\u003c/a\u003e in their SPaM team, working on supply chain and demand forecasting projects.\u003c/p\u003e","title":"About"}]