Friday, March 3, 2017

My Takeaways from the 2017 Tapestry Conference

On Wednesday I had the pleasure of attending the annual one day Tapestry Conference for the first time. I was blown away by many things - the nearly equal representation of genders, the quality of thought presented by speakers, and the desire of attendees from many disciplines to experience the day together and improve, question, and share their practice. In an effort to synthesize my notes and reflect further I'm going to share my key takeaways from some of the speakers. While I know these bullets could never replace the richness of experiencing this live with carefully curated examples to enrich the takeaways, I hope that you find your own “aha” moments and questions arise.


Lena Groeger
News Apps Developer, ProPublica
  • Data doesn't speak for itself - it reflects your thinking.
  • Visualization are not neutral and one design rarely fits all realities.
  • Sometimes it's harmful to reduce individual people to dots.
  • Provide context to your visualization - e.g. flaws, source, what “good” or “bad” mean.
  • Users are not always who we imagine so design against bias by diversifying your teams, seeking peer review, or consulting an expert.
  • Have procedures against bias such as diverse user testing.


Catherine D'Ignazio
Assistant Prof of Data Visualization & Civic Media, Emerson College
  • Creative data literacy recognizes that citizens and non-specialist need alternative pathways into working with data that connect to their lived realities.
  • Think about a community survey. How can you take a lot of data and synthesize the information in deliver it in a creative way for non-data specialist?
    • Her students followed this process:
      1. Divide and Conquer: Subsets of the data were assigned to people.
      2. Analyze: Each subset was then analyzed using a tool called databasic.io - it's free, check it out!
      3. Themes: The analysis was used to identity themes.
      4. Visualize: Each theme became a question of the community and an appropriate visualization (in this case, gifs) to illustrate the question.


Nathaniel Lash
Data reporter, Tampa Bay Times
  • Be careful of getting enamored by technology and focus on the story you’re telling.
  • The story is not the tool, the story is the data.
  • Avoid distractions; remove unnecessary graphics, tooltips, etc.
  • Sometimes, the design may unfold as the story unfolds.


Cole Nussbaumer Knaflic
Author & Speaker, storytelling with data
  • Great stories include repetition, story, and picture.
  • The typical business presentation follows a linear path but that can be a selfish path.
  • The same information can be delivered in an arc where you start with the plot, have rising action, climax, falling action, and the ending.


Matthew Daniels
Editor, Polygraph
  • Question the accuracy of an analysis. Matthew used the example of the Bechdel test. The test analyzes whether a work of fiction features at least two women or girls who talk to each other about something other than a man or boy. It’s recieve a lot of criticism so
  • So what can we do? Fix the test and address the concerns. Matthew decided to analyze the film data in a different way and attempt to make it better (see http://poly-graph.co/bechdel/).
  • Once he made his analysis public there were two reactions: 1) There was an echo chamber effect and the people that liked the Bechdel test before still agreed with his results and 2) There was still divisive rhetoric and the people who disagreed still found issues and critique with the new analysis.
  • So was it a waste of time? No. New perspectives and continued dialogue are needed to move the needle forward.
  • Matthew recommends reading “Zen and the Art of Motorcycle Maintenance”


Michelle Borkin
Assistant Professor, Northeastern University
  • Michelle and others ran an experiment to measure encoding, recognition, and recall of data visualizations.
  • Visualizations that are memorable “at-a-glance” have memorable content.
  • Titles and text are key elements in visualization and help recall the message.
  • Human recognizable objects (e.g. pictograms) can help with the recognition or recall of a visualization.
  • Redundancy helps with visualization recall and understanding.
  • The most memorable had visual associations whereas the least memorable had more semantic associations.
  • Interestingly, when it came to color, people remembered the segments the color created but not the colors themselves.
  • Their data visualization library database is open to your use: http://massvis.mit.edu/
  • For more detail on the study check out: http://www.csail.mit.edu/node/2628
  • Also, here’s an interesting rebuttal of the findings by Stephen Few: https://www.perceptualedge.com/blog/?p=1770


Neil Halloran
Filmmaker, Higher Media
  • We have a greater emotional response to individuals, not statistics.
  • Emotion is not scalable and there’s an upper bound to how much we can care for strangers.
  • Stressing how big the numbers are is more important than the number itself.
  • How do we bring the humanity back to the numbers?
  • The status quo is to focus on individual stores but the flaw is that it doesn’t give us the connection to the bigger picture.
  • Hans Rosling does this well and shows that the data visualization itself doesn’t have to be complex to build that emotional connection.
  • Think of when you play a note on a piano. You feel something and when you play it for others, they feel it too.
  • How do you make data feel big?
  • Really think about the best way for someone to feel and understand the story and do the things that’s better. For example, if it would be better if you were presenting it, present it. Create a video.
  • Do we feel compelled to be interactive because we can? Is that always the right design?
  • We have a tendency to blame misunderstanding or lack of understanding to be a data literacy problem but that’s blaming the audience. As designers, we can push the blame to the user.

Aside from the great speakers, the attendees offered a wealth of knowledge as well. They came from many different industries including government, journalism, academia, and design. By experience every session together, each opportunity for one-on-one conversation were led to really though-provoking reflection and ideas around the several questions that arose throughout the day. While many examples were presented that expressed the speakers points, I found I left the conference with more questions than answers and I think that’s okay. The presentations already sparked new ideas for me and I gathered a lot of helpful thoughts from others too.

Tuesday, November 15, 2016

Using Python for Sentiment Analysis in Tableau

This weeks Makeover Monday's data set was the Top 100 Song's Lyrics. After just returning from Tableau's annual conference and being eager to try their new feature, TabPy, this seemed like the perfect opportunity to test it out. In this blog post, I'm going to offer a step-by-step guide on how I did this. If you haven't used Python before, have no fear - this is definitely achievable for novices - read on! 

For some context before I begin, I have limited experience with Python. I recently completed a challenging but great course through edX that I'd highly recommend if you are looking for foundational knowledge - Introduction to Computer Science and Programming Using Python. The syllabus included advanced Python including Classes and thinking about algorithmic complexity. However, to run the analysis I did, it would be helpful to look up and understand at a high level:

  • - basic for loops
  • - lists
  • - dictionaries
  • - importing libraries

The libraries I used for this, should you want to look up additional documentation, are:

  • - pandas
  • - nltk
  • - time (this one isn't really necessary - I just used it to test computation time differences between TabPy and local processing.)

I have a Mac so if you're trying to reproduce with a PC, you'll find install instructions here as well.

Part 1 - Setting Up Your Environment

1. Make sure you are using Tableau v10.1
2. Open TDE with Top 100 Songs data
3. Install TabPy

Read through the install directions. Here's my simplified version for those not comfortable with GitHub or command line:
- Click the green "Clone or Download" button.
- Select Download
- Unzip the file and save locally (I moved mine to my desktop)
- Open your Terminal and navigate to your TabPy folder. It should contain a file named setup.sh - Run this command: bash setup.sh
If you see this after your install finished, you're all set:
INFO:__main__:{"INFO": "Loading state from state file"}
INFO:__main__:{"INFO": "Initializing tabpy"}
INFO:__main__:{"INFO": "Done initializing tabpy"}
INFO:__main__:{"INFO": "Web service listening on port 9004"}
Now if you're like me and the first time you attempt this you're not successful it may be because you have Python 3.0 and not the required Python 2.7. Or you have both versions but your primary is 3.0 - this is what happened to me as I had Anaconda previously installed (it's part of the TabPy download) and had been using Python 3.0 for the class I took. 


You can manually create a Python 2.7 environment (courtesy of Bora Beran). In your terminal, run:

conda create --name Tableau-Python-Server python=2.7 anaconda
then activate it
and do the pip install from local folders
pip install -r ./tabpy-server/requirements.txt
pip install ./tabpy-client
pip install ./tabpy-server
Part 2 - Connecting to TabPy in Tableau

Now it's time to setup your TabPy in Tableau. In Tableau 10.1 go to:

Help -> Settings and Performance -> Manage External Connection

and enter localhost since you're running TabPy on your own computer. The default port is 9004 so unless you manually changed you should leave it at that


Part 3 - Creating your TabPy Calculation
The TabPy Github page has extensive documentation you should review on using python in Tableau calculations. I simply repurposed one of the calcs they demoed during the TabPy session at #data16 - catch the replay here.

Using the Top 100 songs data set, create the following calculated field:

Everything following # is a comment just to help make sense of what the code is doing. Feel free to remove that text.

Now you can use this calculated field in views with [Word] to process the sentiment score! The downside is that since this is a table calculation and also uses ATTR, you cannot use this within a level of detail calculation (LOD). So unfortunately, you cannot sum of the sentiment on the level of detail of song using this example and data structure. With some data manipulation it is possible but I won't be diving into that.

TabPy vs. Pre-Processing Data for Tableau

Unfortunately, you cannot publish vizzes using TabPy to Tableau Public. If you want to download the .twbx version I made using TabPy, you can do so here.

However, you could run this analysis outside of Tableau and simply import the output and create your viz that way. I did this which also gave me more flexibility with LODs since I no longer was using TabPy.

TabPy definitely took me less time and required less code. However, it did take 
~2.5 minutes*** to process 8,668 words whereas when I ran my code (below) outside of Tableau it took under 1 second to get the scores and write them back to a CSV.

***11/17 Update: Bora Beran made a great point; be mindful of how you're addressing your TabPy Table Calc - "If you have all your dimensions in addressing we will make a single call to Python and pass all the data at once which will be much faster. Otherwise we make one call per partition. If e.g. song title is on partitioning we would send a separate request for each song. If word is on partitioning we will send a separate request per word." 

At the time of posting this blog, I was addressing all dimensions in view and on a few occasions when working with this data I experienced a very slow result return time as stated. However, today when running this calc it took the same time in Tableau as I stated outside of Tableau. I don't have a clear idea as to why but I was running that query on my local machine and think it might have simply been to limited resources to process the analysis at the time. 

Below is what the code would like like outside of TabPy. You can run this code in a Jupyter notebook or another IDE - I used Spyder only because I used that for my class.

You can download my Tableau Public viz which uses the output of the below code to inspect further!


Here's the final viz - half of it is cut off so be sure to view it in Tableau Public:

Using Python for Sentiment Analysis in Tableau

This weeks Makeover Monday's data set was the Top 100 Song's Lyrics. After just returning from Tableau's annual conference and being eager to try their new feature, TabPy, this seemed like the perfect opportunity to test it out. In this blog post, I'm going to offer a step-by-step guide on how I did this. If you haven't used Python before, have no fear - this is definitely achievable for novices - read on! 

For some context before I begin, I have limited experience with Python. I recently completed a challenging but great course through edX that I'd highly recommend if you are looking for foundational knowledge - Introduction to Computer Science and Programming Using Python. The syllabus included advanced Python including Classes and thinking about algorithmic complexity. However, to run the analysis I did, it would be helpful to look up and understand at a high level:

  • - basic for loops
  • - lists
  • - dictionaries
  • - importing libraries

The libraries I used for this, should you want to look up additional documentation, are:

  • - pandas
  • - nltk
  • - time (this one isn't really necessary - I just used it to test computation time differences between TabPy and local processing.)

I have a Mac so if you're trying to reproduce with a PC, you'll find install instructions here as well.

Part 1 - Setting Up Your Environment

1. Make sure you are using Tableau v10.1
2. Open TDE with Top 100 Songs data
3. Install TabPy

Read through the install directions. Here's my simplified version for those not comfortable with GitHub or command line:
- Click the green "Clone or Download" button.
- Select Download
- Unzip the file and save locally (I moved mine to my desktop)
- Open your Terminal and navigate to your TabPy folder. Run these commands:

If you see this after your install finished, you're all set:

Now if you're like me and the first time you attempt this you're not successful it may be because you have Python 3.0 and not the required Python 2.7. Or you have both versions but your primary is 3.0 - this is what happened to me as I had Anaconda previously installed (it's part of the TabPy download) and had been using Python 3.0 for the class I took. 


You can manually create a Python 2.7 environment (courtesy of Bora Beran). In your terminal, run:

conda create --name Tableau-Python-Server python=2.7 anaconda
then activate it
and do the pip install from local folders
pip install -r ./tabpy-server/requirements.txt
pip install ./tabpy-client
pip install ./tabpy-server
Part 2 - Connecting to TabPy in Tableau

Now it's time to setup your TabPy in Tableau. In Tableau 10.1 go to:

Help -> Settings and Performance -> Manage External Connection

and enter localhost since you're running TabPy on your own computer. The default port is 9004 so unless you manually changed you should leave it at that


Part 3 - Creating your TabPy Calculation
The TabPy Github page has extensive documentation you should review on using python in Tableau calculations. I simply repurposed one of the calcs they demoed during the TabPy session at #data16 - catch the replay here.

Using the Top 100 songs data set, create the following calculated field:

Everything following # is a comment just to help make sense of what the code is doing. Feel free to remove that text.

Now you can use this calculated field in views with [Word] to process the sentiment score! The downside is that since this is a table calculation and also uses ATTR, you cannot use this within a level of detail calculation (LOD). So unfortunately, you cannot sum of the sentiment on the level of detail of song using this example and data structure. With some data manipulation it is possible but I won't be diving into that.

TabPy vs. Pre-Processing Data for Tableau

Unfortunately, you cannot publish vizzes using TabPy to Tableau Public. If you want to download the .twbx version I made using TabPy, you can do so here.

However, you could run this analysis outside of Tableau and simply import the output and create your viz that way. I did this which also gave me more flexibility with LODs since I no longer was using TabPy.

TabPy definitely took me less time and required less code. However, it did take 
~2.5 minutes*** to process 8,668 words whereas when I ran my code (below) outside of Tableau it took under 1 second to get the scores and write them back to a CSV.

***11/17 Update: Bora Beran made a great point; be mindful of how you're addressing your TabPy Table Calc - "If you have all your dimensions in addressing we will make a single call to Python and pass all the data at once which will be much faster. Otherwise we make one call per partition. If e.g. song title is on partitioning we would send a separate request for each song. If word is on partitioning we will send a separate request per word." 

At the time of posting this blog, I was addressing all dimensions in view and on a few occasions when working with this data I experienced a very slow result return time as stated. However, today when running this calc it took the same time in Tableau as I stated outside of Tableau. I don't have a clear idea as to why but I was running that query on my local machine and think it might have simply been to limited resources to process the analysis at the time. 

Below is what the code would like like outside of TabPy. You can run this code in a Jupyter notebook or another IDE - I used Spyder only because I used that for my class.

You can download my Tableau Public viz which uses the output of the below code to inspect further!


Here's the final viz - half of it is cut off so be sure to view it in Tableau Public:

Monday, October 10, 2016

Educational Backgrounds of Data Industry Professionals

As some of you may know, I co-founded a group in the Bay Area for women who work in the data industry with Chloe Tseng back in March. These past 8 months have been extremely rewarding for me. Not only have I been sharpening my organizing and community building skills but I've built an amazing network of friendship and support. 

One thing that continually will come up in our conversations is the idea of "non-traditional" vs. "traditional" educational backgrounds. "Traditional" referring to professionals who have a STEM (i.e. Computer Science, Statistics, Math, etc.) background versus those of us who have a degree in a liberal arts field (i.e. Communications, Business, PoliSci, etc.). It's really interesting to see how this manifests in the types of struggles they face. Speaking personally, I studied Political Science and had a few non-data jobs before entering this space. I've always felt a bit behind from peers who have STEM backgrounds which causes me stress and a lot of after hours self-learning activities to "keep up". I've connected with many others who feel similarly. However, we have had amazing guests and attendees at our events with masters or doctorates in Statistics and other STEM fields. They are also very driven and do take self-improvement just as seriously. 

It's very difficult to find data on this divide* - especially specific to people who work in the data industry. I've looked at statistics from the Department of Education and know that in the mid-1980s, 37% of computer science majors were women whereas in 2013-14 only 18%. But the reality is that there are many people in our space who don't have a Computer Science degree, or have a different STEM degree, and they're very successful in their craft.

In an effort to learn more about data industry professionals and their backgrounds, I've created a short 8 question survey that I would appreciate if you take and share on your social networks and with your colleagues. I am hoping to get a good sample size and plan to use some of this information in a conference presentation.

Survey: https://goo.gl/forms/e7hZfbbeNEmgRmd83




* I did find a survey about new coders. It's a great data set too for analysis! http://bit.ly/2dRIqNO

Saturday, June 4, 2016

Resources for Self-Improvement for Data Industry Professionals

Over the last year I have noticed that as my social engagement increased I started to receive many messages to the likes of the following:

  •  "I came across your Tableau profile/blog/Twitter and as a new user I would love to know your journey/resources you used to learn the tool."

  • "How did you enter the data field with a background in political science?"

  • "I came across your profile and was very impressed with your achievements and career path as you have grown into the Business Intelligence field. What advice would you have to a new comer?"

  • "As someone who came from a non-technical background and quickly grown into the BI field successfully, I am wondering if you would share your experiences and tips." 

For a while I felt a bit out of place to receive the compliments and struggled to realize I had a point of view that could be valuable to others in their own career progression. With the support of my peers and the amazing women I get to engage with in the Tableau/data community I have gained a new confidence and comfort in sharing my techniques. I've always been a compassionate person and I truly love helping others. Instead of keeping all the tips and tricks that have helped me grow hidden in private messages, I thought I would share with you publicly!


1. Always surround yourself with people who are smarter than you. I have always been intimidated by folks who I aspire to be like but have found that when I overcome that shyness and seek out their help they usually like to share it. I have two categories of these people. First, the peers I interact with regularly so perhaps people at work who are senior to me. With these folks, I seek mentors and try to meet on a cadence. I also use these people for advice on an as needed basis. Second, what I call "stretch connections" - people who are many levels senior to me, experts in industry, etc. I connect with the people via LinkedIn and Twitter. I usually introduce myself, compliment the aspects of their work I admire and try to see if we can chat over the phone or meet over coffee if we're located in the same place. I use these opportunities to discover how they've found success, their thoughts on the field, etc. Through this, I've met Founders, Data Scientists in the U.S. and met with people at companies I dream to work at.

2. Take on opportunities you're afraid of. I always find it easy to doubt my capabilities but my last two jobs I had fears about the level or work and expectations of my performance. However, I've always felt when under pressure and forced do step outside my comfort zone that it accelerates my learning. I'm able to quickly understand new technologies and apply them quickly. I am able to learn more in a few months through hands-on experience versus years of higher education.

3. Have a growth mindset. Good book on the subject:http://www.amazon.com/Mindset-The-New-Psychology-Success/dp/0345472322 Basically, believe that your knowledge isn't fixed and that you always have the ability to learn new things. Some things may not come as quick as others but with handwork and persistence you can succeed in your endeavors. With that, always being cognizant that your current understanding is limited and that there is always more to learn. Be humble enough to recognize this and always strive to be better.

4. Have a point of view. Recently I've been becoming more involved in the public space - tweeting and sharing my work on Tableau public. As I've increased my social presence, I've increased my clout and have experienced more people viewing my LinkedIn page, reaching out to me for opportunities, citing my work in their articles, etc. Gaining credibility in the field really helps push you to develop your skills further and often widens your circle.



New(er) to business intelligence, tableau or data in general? Looking to move into this career path? Here are some of the tools that have helped me along the way!
  • Books: 
    • Data Viz - Information Dashboard Design by Stephen Few for getting a quick foundation in data visualization design principles and really practical insights. I've honestly only read this and another one of his books, Now You See It. I just started Alberto Cairo's, The Truthful Art. Check out Andy Kriebel's curated list of suggested books here!
    • Stats - Naked Statistics by Charles Wheelan
    • Business - Creativity, Inc. by Ed Catmull, Seven Steps to Mastering Business Analysis by Barbar Carkenord
    • Books on my shortlist to read - Mindset: The New Psychology of Success by Carol Dweck, The Signal and the Noise: Why so Many Predictions Fail
  • Podcasts: I dabble in a lot of podcast by my two favorite that I listen to most consistantly are:
    • Partially Derivative
    • Freakanomics
  • Newsletters: I subscribe to a few newsletters that send me really interested datasets or date-oriented stories
  • Engage via
    • Twitter - The Tableau community is particularly active on Twitter. I use Twitter to ask questions, learn from others and stay abreast of the latest trends in the data viz industry. There's so many people I could recommend to follow but as a start, check out the Tableau Social Media Ambassadors.
    • Tableau Public - create an account and start sharing your work. Solicit feedback from those you admire. Download other folks dashboards and reverse engineer them.
  • Meet Ups
    • The Meetup and Eventbrite are teeming with so many industry events.
    • Don't see the group you're interested in? Start your own! I recently started a women in data meet up group in the San Francisco Bay Area. I have met so many amazing people as a result of it. If you're interested in starting a group and what to know more about my experience and lessons learned, reach out!

I hope that this has proved valuable to some of you. Self-improvement is an important value I've always held close. I would love to learn what your best advice is. Comment below or tweet me


Enter your email address:

Delivered by FeedBurner

About

My name is Brit and I specialize in business intelligence and am located in the San Francisco Bay Area. I am an avid Tableau vizzer and lover of data. By day, I am a Data Visualization Consultant with Slalom. Outside of work I'm fascinated by patterns in human behavior, the impact of shared economies, space and the quantified self. I was inspired to start this blog after attending the "Every Data Rockstar Needs a Stage" blogger panel at the 2015 Tableau Conference. My goal is to use this blog as a platform to connect with the community while contributing new content, to encourage me to practice, as a space to try new things and to explore data sets of interest.

Popular Posts

Contact Form

Name

Email *

Message *

Powered by Blogger.