Workshop - Automating phone-calls and SMS

Many apps such as Twitter, Uber, Facebook, etc. use your phone number as your main/only user ID. Those apps require from you to validate your phone number as part of the registration/login process.

If you do the same or consider doing so, you can't afford downtime of your automated phone calls / SMS service and you must have full, production grade end2end tests constantly monitoring those flows.

This workshop will provide you with the background, the tools and basic hands-on experience to fully automate any kind of registration/login/MFA flow that involves SMS and/or automated phone calls using few simple, ready-to-use, open-source samples.


Outline/Structure of the Workshop

  • Short intro
  • The usage of SMS & automated phone calls on web and mobile apps (15 minutes)
    • Phone number verification / registration process
    • Bot detection / blocking
    • User account security
  • Common practices (15 minutes)
    • SMS with code
    • SMS with link
    • Automated phone calls that read / receive an X-digit code
  • Automating the reception of SMS messages (5 minutes)
    • The tools: hardware vs. using an API
    • Live-demo of automating SMS reception
  • Automating the reception and transcription of automated phone calls (45 minutes)
    • The tools we use
      • Docker
      • Asterisk
    • Very short intro to VoIP and Asterisk
    • Live, hands-on, session (using provided code samples):
      • Setting up your Asterisk to receive a phone call
      • Recording a call
      • Transcribing a call
        • Using Google Speech API
        • Open source alternatives
      • Performing the above within a Selenium test case.
  • Summary + Q&A (10)
    • How to do this at scale?
      • Synchronizing resources
      • Using multiple phone numbers
    • Automating calls in a continuous-integration environment.

Learning Outcome

The attendees will finish the workshop with the basic foundations and tools to perform the following:

  • Programatically receive an SMS message
  • Programatically receive a phone call
  • Programatically record a phone call
  • Programatically transcribe a phone call using Google's Speech Recognition API
  • Do all the above within a Selenium/Appium test-case

Target Audience

People with automation experience who'd like to learn / add phone calls & SMS automation to their tool belt

Prerequisites for Attendees

  • Familiarity with Selenium
  • Intermediate coding skills in a language that has a Linux-based compiler/interpreter/VM (Python, Java, Node.JS, etc)
  • A laptop with:
    • Docker installed
    • An IDE to the attendee's liking
  • Preferable: A Google Cloud Platform account (not a must)
schedule Submitted 1 year ago

Public Feedback

comment Suggest improvements to the Speaker
  • Pooja Shah
    By Pooja Shah  ~  1 year ago
    reply Reply

    Interesting content. Thanks Or for proposing. Just out of curiosity, would like to know if any of the tools used is open-sourced? Twilio etc. are popular paid services in my understanding. Mentioning the open-source brings most attendees more value and inclination to make use of the talk in a practical sense.

    As well as, the sample code used in the demo, do you plan to provide a public link?

    And in my understanding, a 90 minutes workshop would be more awesome on this content, what do you think?

    • Or Polaczek
      By Or Polaczek  ~  1 year ago
      reply Reply

      Hi Pooja!

      • The presentation uses 3 tools:
        • SMS reception API
        • Asterisk for the phone calls (Asterisk is an OSS)
        • Speech Recognition API.
      • The demos this session provides use Twilio as the SMS reception API and Google's Speech API. Both APIs can be replaced with open-source software and hardware for the SMS reception, however - as this is not as scaleable as using a 3rd party API - the examples I show use the paid versions in order to provide CI-grade solutions.
      • I'll be happy to add a couple of 'do it yourself, for free' slides :)
      • Asterisk and Selenium are the largest players in this presentation and they're both open-source.

      All the code samples I use in my demos are always open-sourced and public - it's an ideological thing :D
      The samples will be very similar to those I used in SeConf Berlin:

      I was wondering wether to suggest a 45 minute talk AND a full-day workshop or just a 90-mins workshop. We could definitely shoot for a 90-min workshop but I think it'll be more difficult to fit an actual 'hands-on' part in the 90-min version. What do you think?


      • Pooja Shah
        By Pooja Shah  ~  1 year ago
        reply Reply

        Thanks Or for the thoughtful and detailed answer. 

        Yes with context to 45 m or 90 min hands-on, we need to grill it a bit more. Suppose you plan to have hands-on, probably the contents could be shrunk and the key subtopics as quick start could be covered?

        One snippet for each sms, voice and speech recognition could be best. And to prevent delay (real workshop factors) we could probably give attendees to download and keep code ready and maybe it could be more of a demonstrating but attendees running them on their machine can do the magic called satisfaction of learning. How do you find it?

        • Or Polaczek
          By Or Polaczek  ~  1 year ago
          reply Reply

          Pooja - thank you for the comments.

          I agree. I've withdrawn the 480-mins proposal and updated this one to a 90-min workshop with a more detailed breakdown.

          Please let me know what you think :)