Eli Weinstock-Herman
"Awareness is about unlearning. It is the recognition
that you don't know as much as you thought you knew."
Scott Adams

A Custom Jasmine Runner to find my slowest Specs

Original post posted on Wednesday, December 21, 2016 at LessThanDot.com

I’ve been playing around lately with a pure command-line Jasmine runner that doesn’t rely on a SpecRunner file to run tests. I work daily with a largish application that is well over 100K lines of front-end code and greater than 7000 front-end tests. Over time as the codebase and test count has grown, our Continuous Integration environment has continued to get slower. While build servers like Jenkins and TeamCity provide some analytics around slow tests, there is still some digging involved to identify the best targets for improvement, something I’m hoping a local runner can ...

Continue reading

CSV file to API using Azure Functions (CSVaaS)

Original post posted on Friday, November 25, 2016 at LessThanDot.com

We’re living in the future. During a conversational aside the other day, the CEO recounted a story of someone he met that was willing to throw money at a product to make it easy to save an excel file and have it surface as an API. A few years ago that was server provisioning and a couple days to a couple weeks of work, depending on the level of analytics, authentication, identity management, documentation, data entry system, and so on you wanted. With the explosion of tools and services we’re seeing in the cloud, now we can do this in an a few hours or less, with 200 lines of code ...

Continue reading

Automated Keyword Extraction – TF-IDF, RAKE, and TextRank

Original post posted on Monday, November 21, 2016 at LessThanDot.com

After initially playing around with text processing in my prior post, I added an additional algorithm and cleaned up the logic to make it easier to perform test runs and reuse later. I tweaked the RAKE algorithm implementation and added TextRank into the mix, with full sample code and links to sources available. I’m also using a read-through cache of the unprocessed and processed files so I can see the content and tweak the cleanse logic.

The first step was to increase my hands-on knowledge of text processing and identify potential algorithms. I used Python as the programming...

Continue reading

Python 3.5+: Unicode output for Windows Console

Original post posted on Sunday, November 20, 2016 at LessThanDot.com

I’ve tripped over this on 2 machines now and end up at the same out of date StackOverflow post, so maybe this will help someone else.

Situation
When you try to print a string with a Unicode character to the console on Windows in Python, you get:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 20-21: character maps to <undefined>

Looking something like this:

Fix
Windows command-line supports unicode now, and Python 3.6+ ties into this support automatically (prior version requires win-unicode-console). You just need a slight bit of magic dust.

Add an...

Continue reading

To Build Automatic Bookmarking – Unsupervised Text Classification

Original post posted on Monday, November 7, 2016 at LessThanDot.com

I’ve been bookmarking all of my online reading for the past 7 years and recently started thinking about using that dataset to dig into trends in my past reading and potentially build a model to start scoring content I haven’t read yet. Even though I have manual keywords for each entry, I decided to look into what I could get with unsupervised text classification techniques to balance out the fact that I had entered those labels over long periods of time.

My first goal is to programmatically extract keywords from consistently formatted blog posts on static website pages.

I learn...

Continue reading

Useful Projects