I have seen the documentation of py-couchdb. I also saw some js functions for removing documents from the database, so I tried to recreate them for my couch database. After the insertion of some data (1.2gb in csv, ~12gb in database i wonder why...) I tried to make some queries on the dataset.
Selects were ok. But when I tried to delete specific docs the queries did not work as expected.
Here are the queries:
map_func = "function(doc) { if (doc['Year'] == 2015) emit(doc._id, null);}"
map_func_2 = "function(doc) { if (doc['Year'] == 2015) emit(null, doc._rev);}"
map_func_3 = "function(doc) { if (doc['Year'] == 2015) emit(doc._deleted='true',null);}"
map_func_4 = "function(doc) { if (doc['Year'] == 2015) emit(doc.deleted='true',null);}"
map_func_5 = "function(doc) { if (doc['Year'] == 2015) emit(doc.deleted=true,null);}"
map_func_6 = "function(doc) { if (doc['Year'] == 2015) emit(doc._deleted:'true',null);}"
Then I called each function and checked if the docs were deleted (from map_func).
print "Querying for Deleting Year = 2015"
t = time.time()
db.temporary_query(map_func_3)
print float(time.time() - t)
print "Querying for Year = 2015"
t = time.time()
print len(list(db.temporary_query(map_func)))
print float(time.time() - t)
But not even any of them actually delete or hide the specified docs. I also tried another approach, doing the deletions from python.
for doc in db.all(as_list=True):
if doc['Year'] == 2015:
db.delete(doc)
The problem here is that the Database.all() function caches the entire data and I get memory overflow errors.