Data Gathering and Cleansing

Conclusions - GIGO

It's funny to put the conclusions first, but let me do so, so you get an idea of what's happening. The data is open-source and easily obtained from the Harvard Khipu Database project. They have the khipu data in two forms - a limited set of 349 khipus in Excel and a bigger SQL database of about 600+ khipu. So the steps are as follows:

  1. Gather the data. In this case I chose SQL due to the larger data set size
  2. Cleanse the data . This is where it always gets hairy. The data fails in various ways. It took me a while to discover the many, many issues. From khipus with no cords, to khipus with no knots, to khipus with cords that didn't exist, or belong to another khipu, to not knot's that don't exist - it's been a fascinating Zen journey in the perils of data integrity.

    Along the journey, one night, tired and frustrated with data integrity checks, I saw a note in the cord database nudo desanudado. Unknotted knot. That's what this cleansing has been like.

    At the end, the database has 511 (reasonably...) well-formed khipu. I think. The database started off with 650 khipu, lost about 40 for various technical reasons (fragmentary, no cords, no knots, etc.) and then lost another 100 khipu due to data integrity check errors.

Data Gathering

The Khipu Database Project has stored all the khipu measurements in two forms: Excel and SQL files. The SQL files are allmost.... ready for datamining. However, in an attempt to make this project portable (in a software sense, not a cloth one LOL) and to save hassles with a SQL server, etc. I am converting the SQL to CSV files. I note that Urton, prefix the database tables with the Quechua word for warehouse collca (but a Spanish spelling unfortunately). The Quechua spelling is qollqa.

Since the khipu database/tables are so small (a total of 100Mb in SQL statements) I used an open-source mySQL, and TablePlus, a SQL GUI to:

  1. Restore the Khipu Project MySQL Database by concatenating all the SQL files `cat collca.sql > make_khipu_db.sql`* and running the resulting SQL file on the MariaDB MySQL server
  2. Save all the tables and query results of the Khipu Project MySQL Database as CSV files (using TablePlus)

    Voila - we now have a bunch of python pandas DataFrame ready CSV files.

In [2]:
# Load required libraries and intialize Jupyter notebook
# Khipu Imports
import khipu_kamayuq as kamayuq  # A Khipu Maker is known (in Quechua) as a Khipu Kamayuq
import khipu_qollqa as kq

The Harvard Khipu database schema (description image by the Khipu Database Project) is shown below. The python classes reconstruct this schema.

Creating the Khipu database

As said previously, the SQL database tables and query results are stored as a CSV tables instead of as SQL CREATE statements. Key tables include khipu_main, cord and cordcluster. Tables that end with dc are code descriptors for symbolic codes in tables. For example the ascher_color_dc tells you that color MB -> translates to Medium Brown...

In [3]:
%ls ../../data/CSV/
app_config_data.csv               cord_value.csv
archive_dc.csv                    cord_value_components.csv
ascher_canutito_color.csv         cords_to_explore.csv
ascher_canuto_color.csv           deleted_khipus.csv
ascher_color_dc.csv               fiber_dc.csv
ascher_cord_color.csv             field_guide_notes.csv
ascher_cord_color_clean.csv       funky_khipu.csv
ascher_cord_color_frequency.csv   grouping_class_code_20100831.csv
ascher_databook_notes.csv         grouping_class_dc.csv
attachment_dc.csv                 khipu_blob_notes.csv
beginning_dc.csv                  khipu_cluster_sums_df.csv
benford.csv                       khipu_cord_sums_df.csv
canutito.csv                      khipu_defaults.csv
canuto.csv                        khipu_docstrings.csv
canuto_cluster.csv                khipu_main.csv
canuto_cord_flat.csv              khipu_main_clean.csv
cluster_summary.csv               khipu_notes.csv
color_operator_dc.csv             khipu_summary.csv
cord.csv                          knot.csv
cord_classification_dc.csv        knot_clean.csv
cord_clean.csv                    knot_cluster.csv
cord_cluster.csv                  knot_cluster_clean.csv
cord_cluster_clean.csv            knot_type_dc.csv
cord_color_adjacencies.csv        num_pendants.csv
cord_color_notes.csv              pcord_colors_processed.csv
cord_colors_processed.csv         pcord_notes.csv
cord_flat.csv                     pigmentation_dc.csv
cord_level_1.csv                  primar_cord_clean.csv
cord_level_2.csv                  primary_cord.csv
cord_level_3.csv                  primary_cord_attach.csv
cord_level_4.csv                  primary_cord_clean.csv
cord_level_5.csv                  primary_cord_processed.csv
cord_level_6.csv                  quechua_chanka_dict.csv
cord_level_7.csv                  quechua_chanka_syllables.csv
cord_notes.csv                    regions_dc.csv
cord_processed.csv                structure_dc.csv
cord_test.csv                     termiantion_dc.csv
cord_top_level_flat.csv           termination_dc.csv
cord_totals.csv                   urton_khipu_type.csv
cord_type_dc.csv                  x_canuto_color_flat.csv

Data Cleansing

We start by building a virginal object-oriented database (OODB) of khipus (essentially Python Classes). Building the initial Khipu OODB of about 620 khipus takes about 10 minutes.

In [4]:
#Clean files
import shutil
CSV_dir = kq.qollqa_data_directory()
    shutil.copy(f"{CSV_dir}khipu_main.csv", f"{CSV_dir}khipu_main_clean.csv");
    shutil.copy(f"{CSV_dir}primary_cord.csv", f"{CSV_dir}primary_cord_clean.csv");
    shutil.copy(f"{CSV_dir}cord_cluster.csv", f"{CSV_dir}cord_cluster_clean.csv");
    shutil.copy(f"{CSV_dir}cord.csv", f"{CSV_dir}cord_clean.csv");
    shutil.copy(f"{CSV_dir}ascher_cord_color.csv", f"{CSV_dir}ascher_cord_color_clean.csv");
    shutil.copy(f"{CSV_dir}knot_cluster.csv", f"{CSV_dir}knot_cluster_clean.csv");
    shutil.copy(f"{CSV_dir}knot.csv", f"{CSV_dir}knot_clean.csv");
In [5]:
#Build a fresh version of the object oriented database (OODB) that is not "cleansed"
print("Building initial khipu OODB")
all_khipus = [aKhipu for aKhipu in kamayuq.fetch_all_khipus(clean_build=True).values()]
print(f"Done - built and fetched {len(all_khipus)} khipus")
Building initial khipu OODB
0: 1000498
25: 1000236
50: 1000371
75: 1000043
100: 1000142
125: 1000069
150: 1000420
175: 1000445
200: 1000580
225: 1000641
250: 1000470
275: 1000335
300: 1000265
325: 1000248
350: 1000339
375: 1000183
400: 1000359
425: 1000119
450: 1000290
475: 1000386
500: 1000392
Unable to create khipu id 1000484 - exception 1000484
525: 1000495
550: 1000523
575: 1000552
600: 1000604
625: 1000652
Done - built and fetched 624 khipus

Examining Khipu_Main

Let's start by looking at the big picture - what khipus do we have to work with. What's the "quality" and "integrity" of the data. We've already had one khipu fail - Khipu ID 1000484, known as UR167 or B/3453A from the American Museum of Natural History.

I note that khipu_main.csv (or the equivalent SQL table) has two errors - one, an empty row without any information (khipu id 10000500) and one mislableled investigator name Ur189. I deleted the empty row by hand, and edited the name to UR189 using MS Excel, prior to starting the database loading...

In [6]:
khipu_main_df = pd.read_csv(f"{CSV_dir}khipu_main.csv") 
khipu_main_df = kq.clean_column_names(khipu_main_df)
(635, 22)
Index(['khipu_id', 'earliest_age', 'latest_age', 'provenance',
       'date_discovered', 'discovered_by', 'museum_descr', 'museum_name',
       'nickname', 'museum_num', 'conditionofkhipu', 'region',
       'investigator_num', 'complete', 'created_by', 'created_on',
       'changed_by', 'changed_on', 'duplicate_flag', 'duplicate_id',
       'archive_num', 'orig_inv_num'],
khipu_id earliest_age latest_age provenance date_discovered discovered_by museum_descr museum_name nickname museum_num ... investigator_num complete created_by created_on changed_by changed_on duplicate_flag duplicate_id archive_num orig_inv_num
0 1000498 0000-00-00 0000-00-00 NaN 0000-00-00 NaN NaN NaN NaN NaN ... AB001 0.0 gurton 1/5/10 10:27 NaN 0000-00-00 00:00:00 0.0 0.0 0.0 NaN
1 1000166 0000-00-00 0000-00-00 NaN 0000-00-00 NaN NaN "Niedersächsische Landesmuseum, Hanover, West ... NaN 6271 ... AS010 0.0 katie 5/24/12 13:33 NaN 0000-00-00 00:00:00 0.0 0.0 0.0 AS010
2 1000167 0000-00-00 0000-00-00 NaN 0000-00-00 NaN NaN "Niedersächsische Landesmuseum, Hanover, West ... NaN 10087 ... AS011 0.0 katie 5/24/12 13:33 NaN 0000-00-00 00:00:00 0.0 0.0 0.0 AS011
3 1000180 0000-00-00 0000-00-00 NaN 0000-00-00 NaN NaN "Niedersächsische Landesmuseum, Hanover, West ... NaN 10217 ... AS012 0.0 leah 5/24/12 13:33 leah 10/21/03 9:59 0.0 0.0 0.0 AS012
4 1000184 0000-00-00 0000-00-00 NaN 0000-00-00 NaN NaN "Niedersächsische Landesmuseum, Hanover, West ... NaN 10086 ... AS013 0.0 leah 5/24/12 13:33 leah 11/10/03 13:07 0.0 0.0 0.0 AS013

5 rows × 22 columns

So we have approximately 630 khipus to start with in the database. Already a few khipu have been culled due to data integrity issues. Most of the issues have to due with cords pointing to the wrong place - for example Pendant Cord 1 belonging to Khipu 1 having a subsidiary cord that is attached to Khipu 2...which in turn has a subsidiary cord attached to Khipu 1, which in turn....

Many of these fields are empty, or uninteresting to our data exploration, so let's build a smaller table:

In [7]:
uninteresting_khipu_columns = ['earliest_age', 'latest_age', 'date_discovered', 'discovered_by', 'complete', 
                               'created_by', 'created_on', 'changed_by', 'changed_on', 'duplicate_flag', 'duplicate_id', 'archive_num']
khipu_df = khipu_main_df.drop(uninteresting_khipu_columns, axis=1)
khipu_df.museum_descr = khipu_df.museum_descr.fillna(value='')
khipu_df.nickname = khipu_df.nickname.fillna(value='')
khipu_df.provenance = khipu_df.provenance.fillna(value='')
khipu_df.provenance = np.where(khipu_df.provenance == 'unknown','Unknown', khipu_df.provenance)
khipu_df.provenance = np.where(khipu_df.provenance == '','Unknown', khipu_df.provenance)
khipu_df.region = khipu_df.region.fillna(value='')
khipu_df.region = np.where(khipu_df.region == 'unknown','Unknown', khipu_df.region)
khipu_df.region = np.where(khipu_df.region == '','Unknown', khipu_df.region)
khipu_df.conditionofkhipu = khipu_df.conditionofkhipu.fillna(value='')
khipu_id provenance museum_descr museum_name nickname museum_num conditionofkhipu region investigator_num orig_inv_num
0 1000498 Unknown NaN NaN Unknown AB001 NaN
1 1000166 Unknown "Niedersächsische Landesmuseum, Hanover, West ... 6271 Unknown AS010 AS010
2 1000167 Unknown "Niedersächsische Landesmuseum, Hanover, West ... 10087 Unknown AS011 AS011
3 1000180 Unknown "Niedersächsische Landesmuseum, Hanover, West ... 10217 Unknown AS012 AS012
4 1000184 Unknown "Niedersächsische Landesmuseum, Hanover, West ... 10086 Unknown AS013 AS013
... ... ... ... ... ... ... ... ... ... ...
630 1000657 Armatambo, Huaca San Pedro Museo Nac., Pueblo Libre 36224 Unknown UR290 UR290
631 1000658 Armatambo, Huaca San Pedro Museo Nac., Pueblo Libre 36268 Unknown UR291A UR291A
632 1000659 Armatambo, Huaca San Pedro Museo Nac., Pueblo Libre 36226 Unknown UR292A UR292A
633 1000660 Armatambo, Huaca San Pedro Museo Nac., Pueblo Libre 36216 Unknown UR293 UR293
634 1000661 Armatambo, Huaca San Pedro Museo Nac., Pueblo Libre 36217 Unknown UR294 UR294

635 rows × 10 columns

Apparently some khipu are in fragmentary condition. Let's remove those for the purpose of this study. Also the orig_inv_num meaning the original author who described the khipu generally matches with the investigator_num Some Ascher descriptions are replaced by Urton descriptions, but on the whole most Aascher descriptions are honored and labeled as such.

In [8]:
fragmentary_khipus_df = khipu_df[khipu_df.conditionofkhipu == "Fragmentary"]
fragmentary_khipu_ids = list(fragmentary_khipus_df.khipu_id.values)
fragmentary_khipu_names = list(fragmentary_khipus_df.investigator_num.values)
print(f"\tfragmentary_khipu_ids: {fragmentary_khipu_ids}")
print(f"\tfragmentary_khipu_names: {fragmentary_khipu_names}")
khipu_df = khipu_df.drop(khipu_df[khipu_df.conditionofkhipu == "Fragmentary"].index)
khipu_df = khipu_df.drop(['conditionofkhipu'], axis=1)
	fragmentary_khipu_ids: [1000454, 1000456, 1000457, 1000458, 1000459, 1000462, 1000465, 1000466, 1000468, 1000469, 1000470]
	fragmentary_khipu_names: ['QU03', 'QU04', 'QU05', 'QU06', 'QU07', 'QU10', 'QU14', 'QU15', 'QU17', 'QU18', 'QU19']

We now have a clean khipu database with 623 khipus to investigate

Examining Primary Cord Data

Most?! khipus have a primary cord. Let's examine the primary cord database

In [9]:
primary_cord_df = pd.read_csv(f"{CSV_dir}primary_cord.csv") 
(635, 16)
0 1000000 1000000 P 0.0 NaN 0.0 26.0 CN K K cbrezine 2011-11-23 19:42:40 NaN 0000-00-00 00:00:00 S NaN
1 1000001 1000001 P 0.0 nudo de comienzo entre 0.0 - 0.5 cm 0.0 16.5 CN K T cbrezine 2011-11-23 19:42:40 NaN 0000-00-00 00:00:00 S nudo de comienzo entre 0.0 - 0.5 cm
2 1000002 1000002 P 0.0 solamente existe cordon principal entre: 0.0-5... 0.0 10.5 CL NaN NaN cbrezine 2011-11-23 19:42:40 NaN 0000-00-00 00:00:00 S solamente existe cordon principal entre: 0.0-5...
3 1000003 1000003 P 0.0 4.0 cm: nudo que une khipu 109B con up Top Cor... 0.0 98.0 CN K K cbrezine 2011-11-23 19:42:40 cbrezine 2003-05-29 09:40:36 S 4.0 cm: nudo que une khipu 109B con up Top Cor...
4 1000004 1000004 P 0.0 65.5 cm: una prolongacion del cordon principal... 0.0 65.5 CN K T cbrezine 2011-11-23 19:42:40 cbrezine 2004-03-03 12:05:20 S 65.5 cm: una prolongacion del cordon principal...

Once again, let's remove uninteresting columns

In [10]:
primary_cord_df = kq.clean_column_names(primary_cord_df)
primary_cord_df = primary_cord_df.drop(['created_by', 'created_date', 'changed_by', 'changed_date'], axis=1)


Two questions immediately are raised. Are there any primary cords that are not attached to a khipu? (In which case we should remove them). The notes for primary cords should be reviewed, as well.

Remove primary cords belonging to fragmentary khipus or to the null row...

In [11]:
print(f"Before: primary_cord_df.shape = {primary_cord_df.shape}")
errant_khipu_ids = list((set(primary_cord_df.khipu_id.values) - set(khipu_df.khipu_id.values)) - set(fragmentary_khipu_ids))
errant_khipu_names = khipu_main_df[khipu_main_df.khipu_id.isin(errant_khipu_ids)].investigator_num.values
print(f"Removing errant_khipu_ids {errant_khipu_ids}")
print(f"Removing errant_khipu_names {errant_khipu_names}")

khipu_ids = khipu_df.khipu_id.values
primary_cord_df = primary_cord_df[primary_cord_df.khipu_id.isin(khipu_ids)]
print(f"After: primary_cord_df.shape = {primary_cord_df.shape}")

primary_cord_khipu_ids = primary_cord_df.khipu_id.values
print(f"Before: khipu_df.shape = {khipu_df.shape}")
khipu_df = khipu_df[khipu_df.khipu_id.isin(primary_cord_khipu_ids)]
print(f"After: khipu_df.shape = {khipu_df.shape}")
Before: primary_cord_df.shape = (635, 12)
Removing errant_khipu_ids [1000594]
Removing errant_khipu_names []
After: primary_cord_df.shape = (623, 12)
Before: khipu_df.shape = (624, 9)
After: khipu_df.shape = (623, 9)

And review primary cord notes:

In [12]:
notes_series = primary_cord_df.notes
notes_series = notes_series[notes_series.notnull()]
for note in notes_series: print(note)
nudo de comienzo entre 0.0 - 0.5 cm
solamente existe cordon principal entre: 0.0-5.0 y 5.5-9.0 cm
4.0 cm: nudo que une khipu 109B con up Top Cord del khipu 109A.
65.5 cm: una prolongacion del cordon principal /o pen. 157
At 7.0 cm there is a knot which appears to unite two khipu
The beginning knot joins this khipu to UR12 and UR15. The knot around 22.5 joins it with UR14. 
This khipu is attached to 257E/UR14 and to 257D/UR13
The primary cord is actually broken into two pieces.  The "marker" at 12.0 cm is actually a break between separate pieces of the quipu which were not necessarily connected at that point.
The main cord is finished with a thread wrapping that extends for 1.0 cm.
1. (Beginning:) $ Nudo de comienzo (cordon doblado)
AT 117.0 cm, nudo final; prolongacion del c.p. por medio de un hilo atado como pendiente.
Total length calculated from last measurement (123.0) plus space (72.0) and an additional 2.0 to account for the group of 9 pendants.
AS207B and AS207C are attached to the primary cord in the gap of 40.5-43.5 cm (as indicated by the knots).
Construction note: The end of the main cord has been cut or cut and wrapped.  This finishing may be intentional and the quipu complete.  At 19.5 cm the main cord has been repaired or joined to another piece of the same cord.
1.  Two cords of 30 cm and 19 cm are knotted together and knotted to the main cord within the 9.5 cm space between pendants 61 and 62.  Both cords appear to have been cut.  The cords are undyed (W) but have marks of being unravelled from some mottled combination.  The cord attachments do not appear to be part of the original khipu construction and there is nowhere on the khipu they seem to come from.
Cords 1 and 2 are linked through the twisted end of the main cord so that they dangle from the end of the main cord.
These fragments were not attached to any primary cord.
The main cord begins with a woven cord ball.  The ball is DB, 1.0 cm thick, and 1.5 cm long.
The twisted ends of the main cords of the 2 parts are tied together with a square knot.
The twisted ends of the main cords of the 2 parts are tied together in a square knot.
See notes for construction of primary cord within wooden bar.
The beginning is twisted, as normal, but then the end of the main cord has been formed into a loop by bending it back on itself and binding it in place.  Two pendants (1-2) are suspended from the end loop.  See Ascher and Ascher 1987:710 for drawing.
Main cord consists of "Br:W S-plyed yarns which are Z-wrapped by Br:W yarns"
This khipu is threaded through a wooden bar (see diagram, Ascher and Ascher 1987:721).  The "groups" represent pendants clustered on loops of the main cord between two holes in the wood.
AS107 is loosely tied around the wooden bar through which AS106 is threaded.
This quipu is threaded through a wooden bar.  (The pendants appear to be listed in order along the cord, with the possible exception of P7-13, because the cord loops through a hole.)
The main cord is finished with a cotton bulb that is 2.0 cm in cross-section.
Another pendant, just after pendant 5, was probably present as indicated by the discoloration on the main cord.
There are no knots on the pendant cords.
Main cord positions were extrapolated for all groups, as only initial positions of "group/top cord/group/space" clusters were recorded.
The main cord is braided D0:W, S-wrapped with D0.
This khipu is threaded through a wooden bar.  See diagram in Ascher and Ascher 1987:815.
This khipu is attached to AS124.  It is looped and tied into the main cord after all the pendant (and knot) before the broken end.
Main cord is Z-plyed and then S-wrapped with a white cord.
The khipu terminates in a fringed woven ball (drawing pg 850 of Ascher and Ascher 1987).  There are 5 ribs on the ball each approx 0.3 cm wide and separated from the next rib by 0.5 cm.  Thus, the ball measures about 4.0 cm around and is 1.25 cm at its widest.
The main cord is made up of 3 cords.
Attached to wooden bar
The beginning is both knotted and twisted.
About 20.0 cm of the main cord is tied in one large knot.
Primary cord appears to have previously been cord-wrapped (s-wrapping); however, the cord-wrapping has been removed (leaving its impression and some discoloration).  Removing the cord-wrapping would have necessitated removing pendant strings and reattaching them later.
Khipu 32.30.30/53 is composed of two khipus tied together (A & B).  Pendants 24 and 25 appear to be a single string, looped through the end of khipu A.
Main cord: three constituents
Main cord:  3 constituents: S-spun, Z-plied, S final ply.
Main cord: 2 components
Primary cord is knotted around the primary cord of khipu AS033A.  The first three pendant strings are outside the knot, thus it is difficult to know how the pendant string directions of these three relate to that of the other pendant strings of this khipu.
thickness about 0.2 cm
thickness is about 0.2 cm
Main cord of Khipu AS033F now passes through plies of main cord of khipu AS033G.  At this end, primary cord of AS033G passes through plies of AS033F.
The main cord was passed through its twisted end to form a loop of 1.0 cm (pendants 1-3 are on this loop), see diagram in Ascher and Ascher 1987:897.
Pendants are actually attached to cord running through the bar; cord is brown, Z-spun, S-plied.
The main cord is finished by being bent back and inserted through its own strands.
Main cord is Z-plyed BB:W which is S-wrapped with BB cord.
Primary cord: brown and white cords are first braided and then the braid is wrapped with a white string (S-wrapped), see photo
The main cord was S-spun AND S-plyed.
Main cord is Br:W S-plyed then Z-wrapped with Br:W cords
Main cord: three strand, Z-spun, S-plyed
The end of the main cord is bent back and inserted through the strands of the cord.
The main cord of the quipu is formed into a large loop by having its knotted end passed through its twisted end.  This loop occurs 2.0 cm after the group of pendants (leaving 7.0 cm after the loop).
The B main cord (AS162A) is extended by a W main cord (AS162B).  The W main cord is formed into a large loop by being passed through its own twisted end.  It has also been passed through the twisted end of the B main cord so that it dangles from the end of the B cord.  (See diagram in Ascher and Ascher 1987:1001.)
The B main cord (AS162A) is extended by a W main cord (AS162B).  The W main cord is formed into a large loop by being passed through its own twisted end.  It has also been passed through the twisted end of the B main cord so that it dangles from the end of the B cord.  (See diagram in Ascher and Ascher 1987:1001.)
Main cord is Br+W and Z-ply
The main cord is finished by being bent back on itself and wound with B colored cord for 2.0 cm (see drawing Ascher and Ascher 1987:1100).
The twisted end of the main cord was folded back and secured with a white cord wrapping.
Thickness is about 0.24 cm
1 AB + 2 YB
Any indication of the color names "Blue" or "Red" in the Ascher notes are only estimates.  No references were made to the color charts.  In this case, BG represents "Blue" and RM represents "Red."
There is a loop attachment at 6.0 cm.
The color designation W/RM refers to a white cord with a thin thread wrapped around it.  The thread wrapping forms a 0.5 cm wide band from 2.5 cm - 3.0 cm.
Start of main cord is turned back and inserted through itself forming a closed loop of 1.0 cm.
Group 5 consists of a loop attachment holding 4 pendant cords.
thickness is about 0.4 cm
almost completely broken in the space between groups 19 and 20 and broken in the space between groups 27 and 28
Thickness ranges from 0.168 to 0.23 cm.
two beginning knots at 2.5 cm
Beginning and end of primary cord are knotted together so that primary cord forms a loop. 
Primary cord has complex color patterning typical of Puruchuco khipu.  The final cord has three components: KB, W, and W-AB.  These three are plied together.  Before final plying, primary cord has two components: one of KB interlocked with W at the doubling point, one of AB interlocked with W at the doubling point.  The presumption is that the W and AB cord was doubled and plied Z, then the KB W cord was doubled and all three were plied together S.  For sketch, see original notes.  Color shorthand KB-W-(W-AB).  
Primary cord probably had another color previously
Dark brown plies have disintegrated.  Color is more accurately notated as (AB-CB) - AB.  All of these elements are the same size; there are about 6 elements in each bundle.  
The brown color is disintegrating in places.
Final cord has three components: one solid, two barber pole: MB- (CB-W)-(CB-W).  
Each component of the primary cord has 9 2-ply components.  
Very little of primary cord visible. 
Typical Puruchuco primary cord structure: two plain components, one barberpole component.  W - KB - (W-KB)
Characteristic Puruchuco primary cord structure: (AB-KB) - AB-KB
spiralled with knotted end inside
Urton thinks primary cord is olive green; Brezine sees more blue, such as GL.  Characteristic Puruchuco primary cord arrangement: (TG-MB) - TG - MB.  Blue portions of cord have 12-13 2-ply  components; brown has 26 2 ply components, counted at ravelled end. 
This cord would usually be denoted RL:W, mottled.  It is composed of four strands of RL and W barberpoled. (RL-W) x 4
Characteristic Puruchuco patterning: (W-KB) - W - KB
Characteristic Puruchuco cord patterning: (AB-W) - W - MB.  (Browns may be the same color, hard to tell).  All four components are the same size.  
Primary cord tapers towards knot. 
(AB-AB) - (GG-W)
The tassel is of red camelid  2-ply S yarns; the cord itself is of cotton.  The join between the tassel and the cord is wrapped with brown cotton thread for about 3 cm.  Wrapping forms a cone shape.  Measurements begin at the far edge of the wrapping.  The tassel is about 2.5 cm long; tassel length is variable and is not included in cord measurement.  A dark brown element has been lost from the primary cord.  All primary cord components are the same size.  (AB-AB-AB-KB)-(AB-AB)
Primary cord has two components, each of which has 4 (?) elements.  Each of the final two includes AB, MB, HB.  Ascher description would be AB:MB:HB for these components; more accurately described as (AB-MB)-(AB-HB).   There is an "anomolous knot or bulge" in primary cord in 1 cm space between groups at 34 cm and 37 cm (between cords 104 and 105).
Cord is 3-ply final S.  Each ply is RL-AB-GG.  Final effect RL:AB:GG.  
Three ply cord, final S.  End of cord is wrapped for 1.5 cm with thread color RL.  The plies are colored (MB:W) - (MB:W) - MB. 
Three ply cord.  (W-RL-KB) - (W-RL-KB) - AB.  The end of the cord is wrapped with KB thread for 1.5 cm. 
The beginning of the cord is inside the wrapping which attaches A, D & E together.  The end is wrapped with white thread for 4.0 cm.  The cord is 3-ply
Three ply cord, W-W-MB.  End of cord is wrapped with AB thread for 3 cm.  Beginning of cord is raveled.  At 20.5 - 22.5 cm is an MB wrapping which joins A, D and E.  
Three components: AB - AB - (LG:AB)
The doubled end of the primary cord is wrapped for 2.5 cm with white thread. The borla is attached to the double end and is strong yellow (SY), probably camelid.   The cord itself is of three components: (AB - W) - AB - W
two elements
The cord has three components:  2 of GG -AB, one of KB - AB.  These three are plied together to create the final cord.  The borla is of red and yellow camelid, approx. 2.0 cm long. From 2.0 to 3.5 the primary cord is wrapped with thread. 
The dark brown color is disintegrated.  At 45.0 cm two of the plies are tied off in a knot.  Cord has four components: W - W - AB - KB, all barber poled together. 
cord has two components. 
Three strand: 2 W strands and one MB:W strand ( all Z-ply ) final S ply
10- In general, the subsidiaries of pendants on khipu UR35 are oriented so that those on adjacent pendants go in opposite directions. This should/may form paired sets of pendants. However, this feature seems to be more common on the first half of the khipu than the second half. 
1- Between groupings 5 and 6 there are loose cords which are knotted together in an overhand knot, then knotted loosely around the primary cord. These cords may be loose subsidiaries and are not included in the pendant count. 
This khipu has a loose string with 4 subsidiaries (see notes).
There are an additional 3 strings tied on with red thread between pendants 2 and 4.
Some pendants have colored threads introduced during the plying, producing a change of color down the length of the string.  This is not a spike.  The threads are camelid with vivid red.
Pendant 13 passes through the looped end of khipu UR117B.
Connected to pendant #13 of UR117A.
The doubled beginning of this cord interlocks with UR117B at 34.0 cm of UR117B.
The doubled beginning of this cord connects with UR117C.
All of the top cords are looped through the attachments of their group pendants.
-The beginning of this cord is ravelled with a knot connecting it to UR56B.
Khipu UR56C is lashed onto this khipu beginning at 29.0 cm until the end at 36.0 cm.
The termination end of this khipu is ravelled/broken.
UR58A and UR58B appear to be two parts of a once much larger khipu containing some 106 cords (58 + 48 = 106).
Attached to UR131A and UR131C.  
at twisted (doubled) end, khipu is spliced to UR028.  
Knot which joins UR024 to URo20 is at 30.5 cm from "beginning" end. 
At beginning end, pCord is knotted to UR024.  At the other end, it is knotted to other khipu.  (Which one?)
At 1.5 cm the cord is knotted with UR023.  The termination end is knotted and raveled. 
Thickness = 0.15
Thickness = 0.21 cm
thickness = 0.22 cm
thickness = 0.13 cm
thickness = 0.28 cm
Between 15 - 28.5 cm, primary cord bears knots:
Top Cord T24 and cords 144-149 are not attached.  They have been included here to preserve linearity of the khipu.
Main cord has three strands, each s-spun/z-ply; all three final ply = S.
Beginning knot at 3.0 cm.
This khipu has an unusual structure.  Illustration is given
Primary cord tied into a loop.
Primary cord tied into a loop.
Primary cord is knotted:
Structure of primary cord is a 4-sided braid.  Beginning and end descriptions are not noted.  Number of elements in braid unknown.  From 3.0 to 3.5 cm, there is a thread wrapping of MB.  
Thickness is the average of four readings: 4.6, 2.79, 4.7, 3.5.   There is a knot in the primary cord from 6.0 - 7.5 and another one from 32.0 - 33.0.  
thickness listed as .22-.25
thickness = .08-.11
thickness listed as : 4.7/3.5/4.3
Beginning: "needlework bundle":  wrapping in brown and white stripes of ~.5cm ea. Wrapped over a core of cords. 
Structure: braided (over a core)
Thickness: 6.3/6.27/6.5 (mm?)
Thread wrapped with colored camelid threads
end: knotted with twisted tail
The last pendant is attached through the twisted end of the cord and falls at the very end. How  can a cord have two twisted ends? We were unable to determine...
broken @ 7.ocm
Begins with a wrapped end (2cm); to fiirst cord group=4.5cm
3 elements, each with many single plies
No length specified... last group at 31.0cm, 3 cords, and space of 3.0cm
thickness: .35/.44/.38
Thickness: .47/.45
Dark brown almost entirely disintegrated. Remaining primary cord almost entirely covered by pendants. Broken both ends.
twisted end
thickness: .19/.15
thickness: .26/.20/.17
thickness: .25/.26/.25
Thickness: .31/.38
thickness: .17/.21/.22/.15
thickness: .38/.35/.38
0.0-1.5 cm ravelled end to thread-wrap (AB)
No information provided for primary cord; total length estimated.
possibly vegetal fiber in the last 8.0 cm
Space of 6.0 cm to knot, space of 7.5 cm from knot to cord group.
Primary Cord looks to be four khipus knotted together.  For the purposes of this data entry, we assume that it is one primary cord looped into four sections, which we have entered as looped pendant cords with subsidiary cords.  Please refer to original record sheet for drawing.  
Both needlework bundles have tassels
Primary cord of UR 205 inserted between beginning and first pendant group
Twisted end, space of 5.0 cm
Primary cord is a looped thread whose free end is wound around the primary cord of UR 205 and passes through the opening in the twisted end of UR 204 (see diagram)
At 1.0 cm connection with decorative cord (see observations) space of 2.5 cm
Final Twist:
Attached to UR228 at 2.0 and 52.0 cm. 
Knotted attachments to UR227 at 1.5 and 72.5 cm
Termination End listed as "wrapped"
Final twist: Braided
Beginning end spliced with UR 233
Attached to khipu UR 244 at 7.0 cm.
2.0 cm. wrapping and attachment with UR 243

Cords and Cord Clusters

A few khipus have no cords. Let's eliminate the zero cord khipus

In [13]:
zero_cord_khipu_ids = [aKhipu.khipu_id for aKhipu in all_khipus if aKhipu.num_pendant_cords()==0]
zero_cord_khipu_name = [kq.khipu_name_from_id(aKhipu.khipu_id) for anId in zero_cord_khipu_ids]
print(f"Removing zero_cord_khipu_name {zero_cord_khipu_name}")
print(f"Before: khipu_df.shape = {khipu_df.shape}, Zero cord ids: {len(zero_cord_khipu_ids)}")
khipu_df = khipu_df[~khipu_df.khipu_id.isin(zero_cord_khipu_ids)]
print(f"After: khipu_df.shape = {khipu_df.shape}")
Removing zero_cord_khipu_name []
Before: khipu_df.shape = (623, 9), Zero cord ids: 0
After: khipu_df.shape = (623, 9)

Do the same for cords, cord clusters, and ascher_cord_colors

In [14]:
valid_khipu_ids = list(set(khipu_df.khipu_id.values) & set(kq.cord_cluster_df.khipu_id.values))
print(f"Before: cord_cluster_df.shape = {kq.cord_cluster_df.shape}")
cord_cluster_df = kq.cord_cluster_df[kq.cord_cluster_df.khipu_id.isin(valid_khipu_ids)]
print(f"After: cord_cluster_df.shape = {cord_cluster_df.shape}")

cord_df = pd.read_csv(f"{CSV_dir}cord.csv") 
cord_df = kq.clean_column_names(cord_df)
cord_df = cord_df.drop(['created_by', 'created_on', 'changed_by', 'changed_on'], axis=1)

print(f"Before: cord_df.shape = {cord_df.shape}")
cord_df = cord_df[cord_df.khipu_id.isin(valid_khipu_ids)]
print(f"After: cord_df.shape = {cord_df.shape}")

ascher_cord_color_df = pd.read_csv(f"{CSV_dir}ascher_cord_color.csv") 
ascher_cord_color_df = kq.clean_column_names(ascher_cord_color_df)
ascher_cord_color_df = ascher_cord_color_df.drop(['created_by', 'created_on', 'changed_by', 'changed_on'], axis=1)

# Ascher cord colors also point to primary cords (see pcord_flag)
print(f"Before: ascher_cord_color_df.shape = {ascher_cord_color_df.shape}")
valid_cord_color_ids = list(set(cord_df.cord_id.values) | set(primary_cord_df.pcord_id.values))
ascher_cord_color_df = ascher_cord_color_df[ascher_cord_color_df.cord_id.isin(valid_cord_color_ids)]

print(f"After: ascher_cord_color_df.shape = {ascher_cord_color_df.shape}")

# Many cords (1 in 6!) have NaN as their attached_to. What's up with that?
Before: cord_cluster_df.shape = (15798, 18)
After: cord_cluster_df.shape = (15753, 18)
Before: cord_df.shape = (56870, 21)
After: cord_df.shape = (55651, 21)
Before: ascher_cord_color_df.shape = (58609, 23)
After: ascher_cord_color_df.shape = (57290, 23)
(9253, 21)

Some cords have missing parents...

In [15]:
has_cord_parents_mask = cord_df.pendant_from.isin(cord_df.cord_id.values)
has_pcord_parents_mask = cord_df.pendant_from.isin(primary_cord_df.pcord_id.values)
has_parents_mask = (has_cord_parents_mask | has_pcord_parents_mask)
num_orphan_cords = sum(~has_parents_mask)
print(f"# of cords missing parents = {num_orphan_cords}")
print(f"Before: cord_df.shape = {cord_df.shape}")
cord_df = cord_df[has_parents_mask]
print(f"After: cord_df.shape = {cord_df.shape}")
# of cords missing parents = 288
Before: cord_df.shape = (55651, 21)
After: cord_df.shape = (55363, 21)

Cord Clusters with Incorrect Cord Pointers

Data Integrity Failure Check

By comparing the pendant_from fields of cords versus the cord_id of clusters, I discovered that 44 khipu have cord clusters that point to cords that don't belong to the khipu. For example, look at UR181/1000491 which has a cord cord_id=3052039 whose pendant_from 1000592 actually points to UR254/1000592

UR003 and UR149

Two of the khipus, UR003 and UR149 have excel files. On viewing the Excel files, I find that UR003 has 371 cords that have something in their fields, and a total of 146 cords that say nothing, while the database says it has 758 directly attached pendants, and 761 cord cluster pendants. UR149 says it has 256 to 265 cords, but the Excel spreadsheet says it has 272 cords. Clearly something's wrong.

As a data "safety measure", we could/should remove these khipus ....

In [16]:
def funky_check(aKhipuList):
    funky_ids = [ ]
    funky_names = [ ]
    funky_cord_cluster_cords = [ ]
    funky_attached_cords = [ ]
    for aKhipu in aKhipuList:
        #if aKhipu.is_funky_khipu():
        if aKhipu.num_cc_cords() != aKhipu.num_attached_cords():
    the_funky_df = pd.DataFrame({"khipu_id": funky_ids, "name": funky_names,
                                "#cord_cluster_cords":funky_cord_cluster_cords, "#attached_cords":funky_attached_cords})
    return the_funky_df

funky_df = funky_check(all_khipus)

print(f"Before funky removal: khipu_df.shape = {khipu_df.shape}")
khipu_df = khipu_df[~khipu_df.khipu_id.isin(funky_df.khipu_id.values)] 
print(f"After funky removal: khipu_df.shape = {khipu_df.shape}")
Before funky removal: khipu_df.shape = (623, 9)
After funky removal: khipu_df.shape = (533, 9)

Knots and Knot Clusters

First remove knot clusters and knots from previously eliminated khipu.

Then, a few khipus have no knots. Let's eliminate the zero knot khipus.

In [17]:
knot_cluster_df = pd.read_csv(f"{CSV_dir}knot_cluster.csv") 
knot_cluster_df = kq.clean_column_names(knot_cluster_df)
knot_cluster_df = knot_cluster_df.drop(['created_by', 'created_on', 'changed_by', 'changed_on'], axis=1)
print(f"Before: knot_cluster_df.shape = {knot_cluster_df.shape}")
knot_cluster_df = knot_cluster_df[knot_cluster_df.cord_id.isin(cord_df.cord_id.values)]
print(f"After: knot_cluster_df.shape = {knot_cluster_df.shape}")

knot_df = pd.read_csv(f"{CSV_dir}knot.csv") 
knot_df = kq.clean_column_names(knot_df)
knot_df = knot_df.drop(['created_by', 'created_on', 'changed_by', 'changed_on'], axis=1)
print(f"Before: knot_df.shape = {knot_df.shape}")
knot_df = knot_df[knot_df.cord_id.isin(cord_df.cord_id.values)]
print(f"After: knot_df.shape = {knot_df.shape}")
Before: knot_cluster_df.shape = (63287, 8)
After: knot_cluster_df.shape = (60027, 8)
Before: knot_df.shape = (120331, 11)
After: knot_df.shape = (116193, 11)

Then remove khipus that have cords with no knots.

In [18]:
cord_ids = list(knot_df.cord_id.unique())
knotty_khipu_ids = list(cord_df[cord_df.cord_id.isin(cord_ids)].khipu_id.unique())
zero_knot_khipu_ids = khipu_df[~ khipu_df.khipu_id.isin(knotty_khipu_ids)].khipu_id.values
zero_knot_khipu_names = khipu_df[khipu_df.khipu_id.isin(zero_knot_khipu_ids)].investigator_num.values
print(f"Removing zero_knot_khipu_names {zero_knot_khipu_names}")
print(f"Before: khipu_df.shape = {khipu_df.shape}")
khipu_df = khipu_df[khipu_df.khipu_id.isin(knotty_khipu_ids)]
print(f"After: khipu_df.shape = {khipu_df.shape}")
#Remove cords and cord_clusters that have no khipus associated with them as a result of all this deletion
cord_cluster_df = cord_cluster_df[cord_cluster_df.khipu_id.isin(khipu_df.khipu_id.values)]
cord_df = cord_df[cord_df.khipu_id.isin(khipu_df.khipu_id.values)]
Removing zero_knot_khipu_names ['AB001' 'AS025' 'AS130 B' 'AS190' 'HP025' 'HP026' 'HP028' 'HP048' 'HP055'
 'QU01' 'QU08' 'QU09' 'QU11' 'QU13' 'QU16' 'UR044' 'UR070' 'UR071' 'UR082'
Before: khipu_df.shape = (533, 9)
After: khipu_df.shape = (513, 9)

Some knot_clusters fail integrity checks for num_knots field. A manual/code fix is to reset num_knots to correct length of knots in database table.

Knot Cluster 1000036 fails integrity check - num_knots: 8 != len(self._knots): 1
    knot_cluster_id: 1000036, cord_id: 3000030, khipu_id: 1000002, khipu_name: UR020
Knot Cluster 1016227 fails integrity check - num_knots: 1 != len(self._knots): 4
    knot_cluster_id: 1016227, cord_id: 3016537, khipu_id: 1000175, khipu_name: AS056
Knot Cluster 1017046 fails integrity check - num_knots: 2 != len(self._knots): 1
    knot_cluster_id: 1017046, cord_id: 3017118, khipu_id: 1000185, khipu_name: AS014
Knot Cluster 1017096 fails integrity check - num_knots: 3 != len(self._knots): 1
    knot_cluster_id: 1017096, cord_id: 3017132, khipu_id: 1000185, khipu_name: AS014
Knot Cluster 1022353 fails integrity check - num_knots: 1 != len(self._knots): 2
    knot_cluster_id: 1022353, cord_id: 3021924, khipu_id: 1000275, khipu_name: UR089
Knot Cluster 1041878 fails integrity check - num_knots: 1 != len(self._knots): 0
    knot_cluster_id: 1041878, cord_id: 3040011, khipu_id: 1000472, khipu_name: UR165
Knot Cluster 1041882 fails integrity check - num_knots: 5 != len(self._knots): 6
    knot_cluster_id: 1041882, cord_id: 3039869, khipu_id: 1000472, khipu_name: UR165

Save Cleansed DataFrames

Finally, we save the cleaned DataFrames (with one last integrity check) and rebuild the database. Building the final Khipu OODB takes about 10 minutes.

In [19]:
primary_cord_df = primary_cord_df[primary_cord_df.khipu_id.isin(khipu_df.khipu_id)]
cord_cluster_df = cord_cluster_df[cord_cluster_df.khipu_id.isin(khipu_df.khipu_id)]
cord_df = cord_df[cord_df.khipu_id.isin(khipu_df.khipu_id)]
ascher_cord_color_df = ascher_cord_color_df[ascher_cord_color_df.khipu_id.isin(khipu_df.khipu_id)]
knot_cluster_df = knot_cluster_df[knot_cluster_df.cord_id.isin(cord_df.cord_id)]
knot_df = knot_df[knot_df.cord_id.isin(cord_df.cord_id)]
In [20]:
# Refresh in-memory databases
    print("Building final khipu object-oriented database")
    all_khipus = [aKhipu for aKhipu in kamayuq.fetch_all_khipus(clean_build=BUILD_FRESH_OODB).values()]
    print(f"Done - processed {len(all_khipus)} khipus")
Building final khipu object-oriented database
0: 1000166
25: 1000213
50: 1000181
75: 1000054
100: 1000145
125: 1000165
150: 1000424
175: 1000579
200: 1000000
225: 1000238
250: 1000250
275: 1000348
300: 1000356
325: 1000073
350: 1000124
375: 1000320
400: 1000401
425: 1000492
450: 1000509
475: 1000536
500: 1000598
Done - processed 511 khipus
In [21]:
# Another Integrity check...
all_khipus = [aKhipu for aKhipu in kamayuq.fetch_all_khipus().values()]
funky_khipu_df = funky_check(all_khipus)

Conclusion - Expurgated Khipus

A review of which khipus were removed, and why.

In [22]:
deleted_khipus_df = pd.read_csv(f"{CSV_dir}deleted_khipus.csv")

# Integrity check
original_khipu_df = pd.read_csv(f"{CSV_dir}khipu_main.csv") 
original_khipu_ids = set(original_khipu_df['KHIPU_ID'].values)
final_khipu_ids = set(kq.khipu_df.khipu_id.values)
removed_khipu_ids = sorted(list(set(list(original_khipu_ids - final_khipu_ids))))
if len(removed_khipu_ids) != deleted_khipus_df.shape[0]: 
    print("Need to update deleted_khipus.csv")
print(f"Removed {len(removed_khipu_ids)} khipus due to lack of cords, knots, or integrity check failures")

removed_khipu_names = list(original_khipu_df[original_khipu_df.KHIPU_ID.isin(removed_khipu_ids)].INVESTIGATOR_NUM.values)
removed_urton_khipus = [aName for aName in removed_khipu_names if aName.startswith("UR")]
print(f"Removed {len(removed_urton_khipus)} Urton Khipus, two (UR003, and UR149) due to data integrity check failure:")

print("\nRemoved following Database Khipus that have associated Excel files:")
from pathlib import Path
path = Path(CSV_dir)
XLS_dir = f"{path.parent}XLS/"
excel_files_removed = 0
for aName in removed_khipu_names:
    excel_name = XLS_dir + aName+ ".xls"
    if os.path.exists(excel_name): 
        excel_files_removed += 1

print(f"{excel_files_removed} excel files removed")
Removed 122 khipus due to lack of cords, knots, or integrity check failures
Incorrect cord pointers    83
Zero knots                 17
Fragmentary                11
Zero cords                 10
Missing Primary Cord        1
Name: Reason_For_Removal, dtype: int64
Removed 53 Urton Khipus, two (UR003, and UR149) due to data integrity check failure:
['UR039', 'UR044', 'UR050', 'UR052', 'UR054', 'UR055', 'UR070', 'UR071', 'UR082', 'UR110', 'UR112', 'UR144', 'UR158', 'UR167', 'UR190', 'UR193', 'UR196', 'UR206', 'UR209', 'UR251', 'UR252', 'UR253', 'UR254', 'UR255', 'UR257', 'UR258', 'UR259', 'UR260', 'UR261', 'UR262', 'UR263', 'UR266', 'UR267A', 'UR267B', 'UR268', 'UR269', 'UR270', 'UR271', 'UR272', 'UR273A', 'UR273B', 'UR274A', 'UR275', 'UR276', 'UR277', 'UR278', 'UR279', 'UR280', 'UR281', 'UR284', 'UR288', 'UR292A', 'UR293']

Removed following Database Khipus that have associated Excel files:
0 excel files removed

122 khipus were deleted. A log of these files, their names, and their reasons for being deleted is kept in the CSV folder under deleted_khipus.csv. As you can see above, the code checks to see if the log needs to be updated.

83 khipus were deleted due to data integrity checks failing. These checks often fail on a consecutive series of 4 or 5 khipus.

38 khipus exhibited zen-like existence with unknotted knots or uncorded cords... (0 knots or 0 cords)

53 Urton Khipus were removed (mostly due to being fragmentary, or zero cords or zero knots)

10 of the Khipus removed has associated Excel files. Two (UR003, UR149) due to integrity check failures with cords and cord clusters.

5 Khipus had knot clusters with bad num-knots counts: AS014, AS056, UR020, UR089, and UR165.
These were not deleted, since the object-oriented Python classes can fix this at creation time.

Some knots had Nan (not a number) or other missing information for type_code (i.e. single, figure-8, long knot, etc.) and num_turns.
These empty fields now default to 'U' for empty type code, and 0 for num_turns of the knot.

Nudo desanudado (the untied knot ) - from an actual cord note in the database

Have we understood the zen koan yet?