Thursday, May 10, 2018

Using FTDNA's API

Edit 1 5/11/2018: Note that this does not mean anything bad for you at all and you can continue using ftdna as normal, it's a fantastic site. Also, there is a possibility that the information I was sent regarding who was supposed to be able to use the API could have been incorrect, see the comments section for an interesting discussion on that. I think that the big takeaway from this post should be if you want to use the API for your company it is worth checking in with ftdna to ensure you have permission to do so.

Edit 2 5/11/2018: I have been looking into it more, and I believe that this is not in fact an issue at all which I am very happy about. I am still confused about why ftdna told me otherwise, and it may be that you are technically still not supposed to use it, but I'm really not sure at this time. Special thanks to the anonymous commenter for clearing things up!

Note: I actually wrote this all the way back on March 12th. I had notified FTDNA of the fact that their API was accessible by the public even before that. I am publishing this now because I feel it is my right/duty. They made the decision not to fix it or notify the public that they can use the API, so I think that somebody should. I'm hoping that this post will perhaps raise awareness about this and make FTDNA make a definitive choice about whether they want their API to be publicly available (which I personally think would be fantastic).

Purpose

This document will serve to explain the process through which I found a security vulnerability in the FTDNA web API, as well as the process I went through in reporting said vulnerability and what I would have done differently if given the chance.

Definitions

Some company names and terms used in this document will undoubtedly be unfamiliar to those reading it. This section will serve to clarify said terms.



Company Names

FTDNA: Short for Family Tree DNA. They are estimated to have DNA tested about 850k people making them one of the most popular DNA testing companies today. I have tested 6 people here including myself.

Technical terms

API: Short for Application Programming Interface. Allows programs to send requests to servers and receive responses back. For instance here is an api that when queried (another word for sending a request) will put random animated cats on your website: http://thecatapi.com/

HTTP GET and POST: GET and POST are HTTP methods, meaning that you can use them with the Hypertext Transfer Protocol to interact in some way with whatever is using HTTP. GET will request data from a source (such as our API) and Post will send data to the source to be processed in some way.

DLL: short for Dynamic-Link Library. .dll files hold the code and procedures required by Windows programs.

Variable: In programming, a variable is a piece of text assigned to a value. If a variable is not assigned to a value but is used elsewhere in the program it can be expected that an error will be returned.

Object: In .net programming, this is a block of memory that has been reserved for something. It’s important to note that they can be stored in variables.

Plaintext: Human readable strings that make use of the normal alphabet

JSON: Short for JavaScript Object Notation, it is a format standard used for structuring data for use between different applications.

Tools and Applications

Postman: Postman is an app for api development. Part of API development is testing, so it also has the useful functionality of allowing you to send GET and POST requests among other things.

Task Manager: Windows application for monitoring processes, services, and applications.

DNAGedcom Client: a tool which can be used to gather and analyze DNA data from genetic testing companies.

The Initial Problem

I noticed that I was getting a remote server connection error when using DNA gedcom at home. I tested it out on my school’s network and was able to collect the files I wanted. That made me realize that this was a problem with my home network. After trying a variety of solutions (including contacting DNAGedcom’s tech support) I was about ready to give up. I decided to try one last thing though before I did.

Looking at DNAGedcom’s Files

The following are the steps I took attempting to view DNAGedcom’s main application files:

  • Searched for “DNAGedcom” in the search bar in the Windows 10 taskbar
  • Right clicked the DNAgedcom file that popped up and selected “open file location”


The application reference file location


  • Was led to an application reference file (shown above). Normally you can just right click this and select ‘open file location’ but with this one, it didn’t have that.

Open file location in Task Manager

  • I opened task manager, selected open file location, and was directed to DNAGedcom’s actual files

The DLL

DNAGedcom’s main files

You can see here I have one file highlighted in particular, that being “DNAGedcom.Shared.FTDNA.dll” . I thought to myself, well that might have some useful information that may pertain to what is going wrong. So I opened it in notepad++ and was greeted by this:

All those black characters are non-ASCII.
But if I scroll down all the way to the bottom, you can see they stored the api queries in the dll as plaintext. I could have also used the Linux ‘strings’ command on the file to find all ASCII text, but figured this was faster than firing up a VM. Highlighted below are the get requests the application used to query the API (notice the GET methods).



Messing With The API

I tried the first link I saw in google like so:
https://www.familytreedna.com/my/family-finder-api/profile?kitNums=B22932&kitNums=B24666&kitNums=B24676

This led me to a login screen. When I tried to log in I got the same error I was getting before, which told me that it was likely this link that was causing my connectivity problems in some way. I still have not figured out how to fix it but I did end up finding out a lot of other stuff. So I tried the next link:

https://www.familytreedna.com/my/family-finder-api/matches?filter3rdParty=false&filterId=0&page=DGPAGE&pageSize=DGSIZE&selectedBucket=0&sortDirection=desc&sortField=relationshipPercentage()&trial=0

This link actually returned an error, but the error was descriptive enough I was able to fix it:

The error, specifically, is “Object reference not set to instance of an object”. Well, that's an easy fix. Objects are tied to variables and that must mean that there is a variable in the API reference that needs to be set. So I broke down the URL like so:
https://www.familytreedna.com/my/family-finder-api/matches?filter3rdParty=false&filterId=0&page=DGPAGE&pageSize=DGSIZE&selectedBucket=0&sortDirection=desc&sortField=relationshipPercentage()&trial=0

yellow: Main API URL
Pink: Specifies main type of data to get (match list)
Light Blue: Variables assigned to values
Bright Blue: Variables not assigned to values (just other variables)
Red: Ampersands separating variables from one another

The Bright blue values are the ones I need to change because its variables are causing errors. After some messing around putting the URL in my browser I find that the DGPAGE specified which page of matches to start on. I set this to 1. I also found that the variable DGSIZE was representative of the number of people the API would give information for.

Using Postman

I wanted to try and output the information of a huge number of people, which would be fastest if done with a direct GET request. So I opened up postman, put in the new URL, and received over 160k lines of match related data for over 4,000 people for free as nicely formatted JSON Data. This isn’t actually a huge problem privacy-wise, but it’s bad for FTDNA as if lots of people were to use it it would be a huge server load for which they would make no money off of. Note that you can only get data on the matches for the kit you are logged in to, so its things that you could get about the people anyways on the main match page, just with much better formatting.

Initiating Contact

At this point I realized something might not be quite right. I thought that it would be cool to keep using the API, but I talked to a few friends and they suggested I contact ftdna just in case. So I contacted ftdna through their facebook page (email would have taken longer):



So I sent them a write-up, they thanked me and said they would get it fixed as soon as possible.

What I Would Have Done Differently

After the fact, I realized I should have contacted them immediately instead of waiting a few days. Sometimes companies aren’t especially happy when people mess with things they aren’t supposed to and it could have ended badly if this were a larger issue or if I somehow happened upon information I shouldn’t have. However, FTDNA didn't end up fixing it even several months afterward anyways. Also, I would have made a larger effort to focus on the issue I was having which I still haven’t figured out.

17 comments:

  1. You can do exactly the same thing at both Ancestry and 23andme, using their internal APIs. It is how modern websites function.

    Example (you must be logged in to 23andme, of course):
    https://you.23andme.com/tools/relatives/ajax/?offset=0&limit=100000

    This blog post only risks scaring non-technical users who can't tell if this is an issue to be concerned about or not.

    ReplyDelete
    Replies
    1. This is what I thought at first as well, but when I contacted FTDNA they told me I should not be able to access it. In fact, their error message was changed after I contacted them to say that they had disabled GET requests, but when making calls to it it does still return data. I tried to make it as clear as possible that this doesn't pose any threats to the user in the original post, however it is still a security issue as ftdna stated that they did not wish for non partnered companies to be able to use it.

      Delete
    2. Specifically you can see in ftdna's response to me that they said "We do not have any publicly exposed apis for our website, we do have a web api for our partner companies". I am not a partner company and I can use their web api.

      Delete
    3. Judging by the conversation in the screenshot, it seems like a misunderstanding. They say that it is not for public use, so they will not support it or make it available to the public. ("Public" as in not logged in users, I presume)

      But there is no way of stopping a logged in user from accessing an internal API and retrieve information that is otherwise available on the website. That is true for any website.

      If you can see any information that you should not have access to, that is a different matter.

      Delete
    4. I see what you are saying, I probably should have posted the second part of the conversation as well which seems to confirm that they do not want everyone to have access to it. Here is another screenshot: https://i.imgur.com/o8RGOpN.png

      Delete
    5. I don't know why they responded like that, but clearly they didn't understand that you were talking about a standard internal REST API. I often find that API means different things to different people.
      A public API is meant to be used by third parties. For example, 23andme have a public API at https://api.23andme.com
      An internal API is used for different components to communicate, and a REST API is a standard method that many websites use for communication between the web browser and the web server, including all of the big DNA testing companies.

      Delete
    6. I think there is definitely a possibility you are correct and the person I was corresponding with gave incorrect information. However, there are a couple things to consider:
      1. In the writeup I sent I gave every step of the process including the specific get requests to the specific site I sent.
      2. After I sent it to them, they did change their error page for GET requests to the url with the variables instead of actual values to the following: https://i.imgur.com/fTrL7iB.png

      Delete
    7. I went on ahead and updated the original post saying that its possible the information I have is wrong, and referring people to the comments to see more information about that. Thanks for the help and insights!

      Delete
  2. >Specifically you can see in ftdna's response to me that they said "We do not have any publicly exposed apis for our website, we do have a web api for our partner companies". I am not a partner company and I can use their web api.


    Maybe it is a matter of definition.

    I have a TV in my living room, it is not for public use, I share it with my family and invited friends.
    You can still view it through my living room window.
    But if you should and is allowed to do so is a different matter.

    I would love to have access to an FTDNA API though :-)

    ReplyDelete
    Replies
    1. I believe you are probably right, since writing this post I have learned that there are 3rd parties using this web api without issue from ftdna. I will probably release some small tools for fun working with it in the future :)

      Delete
  3. Hi Renee,
    Thank you for sharing your work! Your information on Mary Brandon, my paternal grandmother is of particular interest!
    Barb

    ReplyDelete
  4. Too bad your efforts may have ruined this for everyone. It was nice when it worked. :(

    ReplyDelete
    Replies
    1. You should still be able to even if they updated the api. I was barely 18 when I wrote this, now I am almost 20. I was definitely not at the point I am now in terms of understanding front facing apis on the web. I just checked and you can still access the api calls they are just different now. It's hardly ruined. If people really want me to I can release new instructions on using it but I have recently become very very sick and had to quit both college and my job, so it would be a long time before I would be able to release it fully.

      Delete
    2. Here is updated python code for logging in, proving it can still be done.

      import requests
      import base64

      #sign in page
      URL = "https://www.familytreedna.com/sign-in"

      #make request
      r = requests.get(url = URL)

      #get verification cookie
      verToken = r.cookies["__RequestVerificationToken"]

      #get login details
      with open('auth.txt', 'r') as the_file:
      info = the_file.readlines()

      username = info[0].strip()
      password = base64.b64decode(info[1]).decode("utf-8")

      #set request values
      URL = "https://www.familytreedna.com/sign-in"
      PARAMS = {"password" : password, "kitNum" : username}
      COOKIES = {"__RequestVerificationToken" : verToken}

      #make request
      r = requests.post(url = URL, params = PARAMS, cookies = COOKIES)
      print(r.text)

      Delete
    3. nice code, well documented. Thx!

      Delete
    4. Okay I did a 'pip3 install requests' to get passed the first error.
      Where does one get the auth.txt file?

      Delete
    5. Hello! The auth.txt file is a sample file you can make on your own. I didn't want to post my actual username and password or risk someone finding it so I used that as a sort of placeholder code. Instead of that you can remove the code under "#get login details" and instead of setting username=info[0].strip(), you instead set it to your actual username. Same with password.

      Sorry for the late reply, I was busy with college. Hope this helps!

      Delete